You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@thrift.apache.org by jf...@apache.org on 2014/04/05 18:22:02 UTC

[1/3] THRIFT-2450 - include HowToContribute in the src repo Client: build Patch: jfarrell

Repository: thrift
Updated Branches:
  refs/heads/master 6cf0ffcec -> 347a5ebb2


http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/thrift.tex
----------------------------------------------------------------------
diff --git a/doc/thrift.tex b/doc/thrift.tex
deleted file mode 100644
index a706fcb..0000000
--- a/doc/thrift.tex
+++ /dev/null
@@ -1,1057 +0,0 @@
-%-----------------------------------------------------------------------------
-%
-%               Thrift whitepaper
-%
-% Name:         thrift.tex
-%
-% Authors:      Mark Slee (mcslee@facebook.com)
-%
-% Created:      05 March 2007
-%
-% You will need a copy of sigplanconf.cls to format this document.
-% It is available at <http://www.sigplan.org/authorInformation.htm>.
-%
-%-----------------------------------------------------------------------------
-
-
-\documentclass[nocopyrightspace,blockstyle]{sigplanconf}
-
-\usepackage{amssymb}
-\usepackage{amsfonts}
-\usepackage{amsmath}
-\usepackage{url}
-
-\begin{document}
-
-% \conferenceinfo{WXYZ '05}{date, City.}
-% \copyrightyear{2007}
-% \copyrightdata{[to be supplied]}
-
-% \titlebanner{banner above paper title}        % These are ignored unless
-% \preprintfooter{short description of paper}   % 'preprint' option specified.
-
-\title{Thrift: Scalable Cross-Language Services Implementation}
-\subtitle{}
-
-\authorinfo{Mark Slee, Aditya Agarwal and Marc Kwiatkowski}
-           {Facebook, 156 University Ave, Palo Alto, CA}
-           {\{mcslee,aditya,marc\}@facebook.com}
-
-\maketitle
-
-\begin{abstract}
-Thrift is a software library and set of code-generation tools developed at
-Facebook to expedite development and implementation of efficient and scalable
-backend services. Its primary goal is to enable efficient and reliable
-communication across programming languages by abstracting the portions of each
-language that tend to require the most customization into a common library
-that is implemented in each language. Specifically, Thrift allows developers to
-define datatypes and service interfaces in a single language-neutral file
-and generate all the necessary code to build RPC clients and servers.
-
-This paper details the motivations and design choices we made in Thrift, as
-well as some of the more interesting implementation details. It is not
-intended to be taken as research, but rather it is an exposition on what we did
-and why.
-\end{abstract}
-
-% \category{D.3.3}{Programming Languages}{Language constructs and features}
-
-%\terms
-%Languages, serialization, remote procedure call
-
-%\keywords
-%Data description language, interface definition language, remote procedure call
-
-\section{Introduction}
-As Facebook's traffic and network structure have scaled, the resource
-demands of many operations on the site (i.e. search,
-ad selection and delivery, event logging) have presented technical requirements
-drastically outside the scope of the LAMP framework. In our implementation of
-these services, various programming languages have been selected to
-optimize for the right combination of performance, ease and speed of
-development, availability of existing libraries, etc. By and large,
-Facebook's engineering culture has tended towards choosing the best
-tools and implementations available over standardizing on any one
-programming language and begrudgingly accepting its inherent limitations.
-
-Given this design choice, we were presented with the challenge of building
-a transparent, high-performance bridge across many programming languages.
-We found that most available solutions were either too limited, did not offer
-sufficient datatype freedom, or suffered from subpar performance.
-\footnote{See Appendix A for a discussion of alternative systems.}
-
-The solution that we have implemented combines a language-neutral software
-stack implemented across numerous programming languages and an associated code
-generation engine that transforms a simple interface and data definition
-language into client and server remote procedure call libraries.
-Choosing static code generation over a dynamic system allows us to create
-validated code that can be run without the need for
-any advanced introspective run-time type checking. It is also designed to
-be as simple as possible for the developer, who can typically define all
-the necessary data structures and interfaces for a complex service in a single
-short file.
-
-Surprised that a robust open solution to these relatively common problems
-did not yet exist, we committed early on to making the Thrift implementation
-open source.
-
-In evaluating the challenges of cross-language interaction in a networked
-environment, some key components were identified:
-
-\textit{Types.} A common type system must exist across programming languages
-without requiring that the application developer use custom Thrift datatypes
-or write their own serialization code. That is,
-a C++ programmer should be able to transparently exchange a strongly typed
-STL map for a dynamic Python dictionary. Neither
-programmer should be forced to write any code below the application layer
-to achieve this. Section 2 details the Thrift type system.
-
-\textit{Transport.} Each language must have a common interface to
-bidirectional raw data transport. The specifics of how a given
-transport is implemented should not matter to the service developer.
-The same application code should be able to run against TCP stream sockets,
-raw data in memory, or files on disk. Section 3 details the Thrift Transport
-layer.
-
-\textit{Protocol.} Datatypes must have some way of using the Transport
-layer to encode and decode themselves. Again, the application
-developer need not be concerned by this layer. Whether the service uses
-an XML or binary protocol is immaterial to the application code.
-All that matters is that the data can be read and written in a consistent,
-deterministic matter. Section 4 details the Thrift Protocol layer.
-
-\textit{Versioning.} For robust services, the involved datatypes must
-provide a mechanism for versioning themselves. Specifically,
-it should be possible to add or remove fields in an object or alter the
-argument list of a function without any interruption in service (or,
-worse yet, nasty segmentation faults). Section 5 details Thrift's versioning
-system.
-
-\textit{Processors.} Finally, we generate code capable of processing data
-streams to accomplish remote procedure calls. Section 6 details the generated
-code and TProcessor paradigm.
-
-Section 7 discusses implementation details, and Section 8 describes
-our conclusions.
-
-\section{Types}
-
-The goal of the Thrift type system is to enable programmers to develop using
-completely natively defined types, no matter what programming language they
-use. By design, the Thrift type system does not introduce any special dynamic
-types or wrapper objects. It also does not require that the developer write
-any code for object serialization or transport. The Thrift IDL (Interface
-Definition Language) file is
-logically a way for developers to annotate their data structures with the
-minimal amount of extra information necessary to tell a code generator
-how to safely transport the objects across languages.
-
-\subsection{Base Types}
-
-The type system rests upon a few base types. In considering which types to
-support, we aimed for clarity and simplicity over abundance, focusing
-on the key types available in all programming languages, omitting any
-niche types available only in specific languages.
-
-The base types supported by Thrift are:
-\begin{itemize}
-\item \texttt{bool} A boolean value, true or false
-\item \texttt{byte} A signed byte
-\item \texttt{i16} A 16-bit signed integer
-\item \texttt{i32} A 32-bit signed integer
-\item \texttt{i64} A 64-bit signed integer
-\item \texttt{double} A 64-bit floating point number
-\item \texttt{string} An encoding-agnostic text or binary string
-\item \texttt{binary} A byte array representation for blobs
-\end{itemize}
-
-Of particular note is the absence of unsigned integer types. Because these
-types have no direct translation to native primitive types in many languages,
-the advantages they afford are lost. Further, there is no way to prevent the
-application developer in a language like Python from assigning a negative value
-to an integer variable, leading to unpredictable behavior. From a design
-standpoint, we observed that unsigned integers were very rarely, if ever, used
-for arithmetic purposes, but in practice were much more often used as keys or
-identifiers. In this case, the sign is irrelevant. Signed integers serve this
-same purpose and can be safely cast to their unsigned counterparts (most
-commonly in C++) when absolutely necessary.
-
-\subsection{Structs}
-
-A Thrift struct defines a common object to be used across languages. A struct
-is essentially equivalent to a class in object oriented programming
-languages. A struct has a set of strongly typed fields, each with a unique
-name identifier. The basic syntax for defining a Thrift struct looks very
-similar to a C struct definition. Fields may be annotated with an integer field
-identifier (unique to the scope of that struct) and optional default values.
-Field identifiers will be automatically assigned if omitted, though they are
-strongly encouraged for versioning reasons discussed later.
-
-\subsection{Containers}
-
-Thrift containers are strongly typed containers that map to the most commonly
-used containers in common programming languages. They are annotated using
-the C++ template (or Java Generics) style. There are three types available:
-\begin{itemize}
-\item \texttt{list<type>} An ordered list of elements. Translates directly into
-an STL \texttt{vector}, Java \texttt{ArrayList}, or native array in scripting languages. May
-contain duplicates.
-\item \texttt{set<type>} An unordered set of unique elements. Translates into
-an STL \texttt{set}, Java \texttt{HashSet}, \texttt{set} in Python, or native
-dictionary in PHP/Ruby.
-\item \texttt{map<type1,type2>} A map of strictly unique keys to values
-Translates into an STL \texttt{map}, Java \texttt{HashMap}, PHP associative
-array, or Python/Ruby dictionary.
-\end{itemize}
-
-While defaults are provided, the type mappings are not explicitly fixed. Custom
-code generator directives have been added to substitute custom types in
-destination languages (i.e.
-\texttt{hash\_map} or Google's sparse hash map can be used in C++). The
-only requirement is that the custom types support all the necessary iteration
-primitives. Container elements may be of any valid Thrift type, including other
-containers or structs.
-
-\begin{verbatim}
-struct Example {
-  1:i32 number=10,
-  2:i64 bigNumber,
-  3:double decimals,
-  4:string name="thrifty"
-}\end{verbatim}
-
-In the target language, each definition generates a type with two methods,
-\texttt{read} and \texttt{write}, which perform serialization and transport
-of the objects using a Thrift TProtocol object.
-
-\subsection{Exceptions}
-
-Exceptions are syntactically and functionally equivalent to structs except
-that they are declared using the \texttt{exception} keyword instead of the
-\texttt{struct} keyword.
-
-The generated objects inherit from an exception base class as appropriate
-in each target programming language, in order to seamlessly
-integrate with native exception handling in any given
-language. Again, the design emphasis is on making the code familiar to the
-application developer.
-
-\subsection{Services}
-
-Services are defined using Thrift types. Definition of a service is
-semantically equivalent to defining an interface (or a pure virtual abstract
-class) in object oriented
-programming. The Thrift compiler generates fully functional client and
-server stubs that implement the interface. Services are defined as follows:
-
-\begin{verbatim}
-service <name> {
-  <returntype> <name>(<arguments>)
-    [throws (<exceptions>)]
-  ...
-}\end{verbatim}
-
-An example:
-
-\begin{verbatim}
-service StringCache {
-  void set(1:i32 key, 2:string value),
-  string get(1:i32 key) throws (1:KeyNotFound knf),
-  void delete(1:i32 key)
-}
-\end{verbatim}
-
-Note that \texttt{void} is a valid type for a function return, in addition to
-all other defined Thrift types. Additionally, an \texttt{async} modifier
-keyword may be added to a \texttt{void} function, which will generate code that does
-not wait for a response from the server. Note that a pure \texttt{void}
-function will return a response to the client which guarantees that the
-operation has completed on the server side. With \texttt{async} method calls
-the client will only be guaranteed that the request succeeded at the
-transport layer. (In many transport scenarios this is inherently unreliable
-due to the Byzantine Generals' Problem. Therefore, application developers
-should take care only to use the async optimization in cases where dropped
-method calls are acceptable or the transport is known to be reliable.)
-
-Also of note is the fact that argument lists and exception lists for functions
-are implemented as Thrift structs. All three constructs are identical in both
-notation and behavior.
-
-\section{Transport}
-
-The transport layer is used by the generated code to facilitate data transfer.
-
-\subsection{Interface}
-
-A key design choice in the implementation of Thrift was to decouple the
-transport layer from the code generation layer. Though Thrift is typically
-used on top of the TCP/IP stack with streaming sockets as the base layer of
-communication, there was no compelling reason to build that constraint into
-the system. The performance tradeoff incurred by an abstracted I/O layer
-(roughly one virtual method lookup / function call per operation) was
-immaterial compared to the cost of actual I/O operations (typically invoking
-system calls).
-
-Fundamentally, generated Thrift code only needs to know how to read and
-write data. The origin and destination of the data are irrelevant; it may be a
-socket, a segment of shared memory, or a file on the local disk. The Thrift
-transport interface supports the following methods:
-
-\begin{itemize}
-\item \texttt{open} Opens the transport
-\item \texttt{close} Closes the transport
-\item \texttt{isOpen} Indicates whether the transport is open
-\item \texttt{read} Reads from the transport
-\item \texttt{write} Writes to the transport
-\item \texttt{flush} Forces any pending writes
-\end{itemize}
-
-There are a few additional methods not documented here which are used to aid
-in batching reads and optionally signaling the completion of a read or
-write operation from the generated code.
-
-In addition to the above
-\texttt{TTransport} interface, there is a\\
-\texttt{TServerTransport} interface
-used to accept or create primitive transport objects. Its interface is as
-follows:
-
-\begin{itemize}
-\item \texttt{open} Opens the transport
-\item \texttt{listen} Begins listening for connections
-\item \texttt{accept} Returns a new client transport
-\item \texttt{close} Closes the transport
-\end{itemize}
-
-\subsection{Implementation}
-
-The transport interface is designed for simple implementation in any
-programming language. New transport mechanisms can be easily defined as needed
-by application developers.
-
-\subsubsection{TSocket}
-
-The \texttt{TSocket} class is implemented across all target languages. It
-provides a common, simple interface to a TCP/IP stream socket.
-
-\subsubsection{TFileTransport}
-
-The \texttt{TFileTransport} is an abstraction of an on-disk file to a data
-stream. It can be used to write out a set of incoming Thrift requests to a file
-on disk. The on-disk data can then be replayed from the log, either for
-post-processing or for reproduction and/or simulation of past events.
-
-\subsubsection{Utilities}
-
-The Transport interface is designed to support easy extension using common
-OOP techniques, such as composition. Some simple utilities include the
-\texttt{TBufferedTransport}, which buffers the writes and reads on an
-underlying transport, the \texttt{TFramedTransport}, which transmits data with frame
-size headers for chunking optimization or nonblocking operation, and the
-\texttt{TMemoryBuffer}, which allows reading and writing directly from the heap
-or stack memory owned by the process.
-
-\section{Protocol}
-
-A second major abstraction in Thrift is the separation of data structure from
-transport representation. Thrift enforces a certain messaging structure when
-transporting data, but it is agnostic to the protocol encoding in use. That is,
-it does not matter whether data is encoded as XML, human-readable ASCII, or a
-dense binary format as long as the data supports a fixed set of operations
-that allow it to be deterministically read and written by generated code.
-
-\subsection{Interface}
-
-The Thrift Protocol interface is very straightforward. It fundamentally
-supports two things: 1) bidirectional sequenced messaging, and
-2) encoding of base types, containers, and structs.
-
-\begin{verbatim}
-writeMessageBegin(name, type, seq)
-writeMessageEnd()
-writeStructBegin(name)
-writeStructEnd()
-writeFieldBegin(name, type, id)
-writeFieldEnd()
-writeFieldStop()
-writeMapBegin(ktype, vtype, size)
-writeMapEnd()
-writeListBegin(etype, size)
-writeListEnd()
-writeSetBegin(etype, size)
-writeSetEnd()
-writeBool(bool)
-writeByte(byte)
-writeI16(i16)
-writeI32(i32)
-writeI64(i64)
-writeDouble(double)
-writeString(string)
-
-name, type, seq = readMessageBegin()
-                  readMessageEnd()
-name =            readStructBegin()
-                  readStructEnd()
-name, type, id =  readFieldBegin()
-                  readFieldEnd()
-k, v, size =      readMapBegin()
-                  readMapEnd()
-etype, size =     readListBegin()
-                  readListEnd()
-etype, size =     readSetBegin()
-                  readSetEnd()
-bool =            readBool()
-byte =            readByte()
-i16 =             readI16()
-i32 =             readI32()
-i64 =             readI64()
-double =          readDouble()
-string =          readString()
-\end{verbatim}
-
-Note that every \texttt{write} function has exactly one \texttt{read} counterpart, with
-the exception of \texttt{writeFieldStop()}. This is a special method
-that signals the end of a struct. The procedure for reading a struct is to
-\texttt{readFieldBegin()} until the stop field is encountered, and then to
-\texttt{readStructEnd()}.  The
-generated code relies upon this call sequence to ensure that everything written by
-a protocol encoder can be read by a matching protocol decoder. Further note
-that this set of functions is by design more robust than necessary.
-For example, \texttt{writeStructEnd()} is not strictly necessary, as the end of
-a struct may be implied by the stop field. This method is a convenience for
-verbose protocols in which it is cleaner to separate these calls (e.g. a closing
-\texttt{</struct>} tag in XML).
-
-\subsection{Structure}
-
-Thrift structures are designed to support encoding into a streaming
-protocol. The implementation should never need to frame or compute the
-entire data length of a structure prior to encoding it. This is critical to
-performance in many scenarios. Consider a long list of relatively large
-strings. If the protocol interface required reading or writing a list to be an
-atomic operation, then the implementation would need to perform a linear pass over the
-entire list before encoding any data. However, if the list can be written
-as iteration is performed, the corresponding read may begin in parallel,
-theoretically offering an end-to-end speedup of $(kN - C)$, where $N$ is the size
-of the list, $k$ the cost factor associated with serializing a single
-element, and $C$ is fixed offset for the delay between data being written
-and becoming available to read.
-
-Similarly, structs do not encode their data lengths a priori. Instead, they are
-encoded as a sequence of fields, with each field having a type specifier and a
-unique field identifier. Note that the inclusion of type specifiers allows
-the protocol to be safely parsed and decoded without any generated code
-or access to the original IDL file. Structs are terminated by a field header
-with a special \texttt{STOP} type. Because all the basic types can be read
-deterministically, all structs (even those containing other structs) can be
-read deterministically. The Thrift protocol is self-delimiting without any
-framing and regardless of the encoding format.
-
-In situations where streaming is unnecessary or framing is advantageous, it
-can be very simply added into the transport layer, using the
-\texttt{TFramedTransport} abstraction.
-
-\subsection{Implementation}
-
-Facebook has implemented and deployed a space-efficient binary protocol which
-is used by most backend services. Essentially, it writes all data
-in a flat binary format. Integer types are converted to network byte order,
-strings are prepended with their byte length, and all message and field headers
-are written using the primitive integer serialization constructs. String names
-for fields are omitted - when using generated code, field identifiers are
-sufficient.
-
-We decided against some extreme storage optimizations (i.e. packing
-small integers into ASCII or using a 7-bit continuation format) for the sake
-of simplicity and clarity in the code. These alterations can easily be made
-if and when we encounter a performance-critical use case that demands them.
-
-\section{Versioning}
-
-Thrift is robust in the face of versioning and data definition changes. This
-is critical to enable staged rollouts of changes to deployed services. The
-system must be able to support reading of old data from log files, as well as
-requests from out-of-date clients to new servers, and vice versa.
-
-\subsection{Field Identifiers}
-
-Versioning in Thrift is implemented via field identifiers. The field header
-for every member of a struct in Thrift is encoded with a unique field
-identifier. The combination of this field identifier and its type specifier
-is used to uniquely identify the field. The Thrift definition language
-supports automatic assignment of field identifiers, but it is good
-programming practice to always explicitly specify field identifiers.
-Identifiers are specified as follows:
-
-\begin{verbatim}
-struct Example {
-  1:i32 number=10,
-  2:i64 bigNumber,
-  3:double decimals,
-  4:string name="thrifty"
-}\end{verbatim}
-
-To avoid conflicts between manually and automatically assigned identifiers,
-fields with identifiers omitted are assigned identifiers
-decrementing from -1, and the language only supports the manual assignment of
-positive identifiers.
-
-When data is being deserialized, the generated code can use these identifiers
-to properly identify the field and determine whether it aligns with a field in
-its definition file. If a field identifier is not recognized, the generated
-code can use the type specifier to skip the unknown field without any error.
-Again, this is possible due to the fact that all datatypes are self
-delimiting.
-
-Field identifiers can (and should) also be specified in function argument
-lists. In fact, argument lists are not only represented as structs on the
-backend, but actually share the same code in the compiler frontend. This
-allows for version-safe modification of method parameters
-
-\begin{verbatim}
-service StringCache {
-  void set(1:i32 key, 2:string value),
-  string get(1:i32 key) throws (1:KeyNotFound knf),
-  void delete(1:i32 key)
-}
-\end{verbatim}
-
-The syntax for specifying field identifiers was chosen to echo their structure.
-Structs can be thought of as a dictionary where the identifiers are keys, and
-the values are strongly-typed named fields.
-
-Field identifiers internally use the \texttt{i16} Thrift type. Note, however,
-that the \texttt{TProtocol} abstraction may encode identifiers in any format.
-
-\subsection{Isset}
-
-When an unexpected field is encountered, it can be safely ignored and
-discarded. When an expected field is not found, there must be some way to
-signal to the developer that it was not present. This is implemented via an
-inner \texttt{isset} structure inside the defined objects. (Isset functionality
-is implicit with a \texttt{null} value in PHP, \texttt{None} in Python
-and \texttt{nil} in Ruby.) Essentially,
-the inner \texttt{isset} object of each Thrift struct contains a boolean value
-for each field which denotes whether or not that field is present in the
-struct. When a reader receives a struct, it should check for a field being set
-before operating directly on it.
-
-\begin{verbatim}
-class Example {
- public:
-  Example() :
-    number(10),
-    bigNumber(0),
-    decimals(0),
-    name("thrifty") {}
-
-  int32_t number;
-  int64_t bigNumber;
-  double decimals;
-  std::string name;
-
-  struct __isset {
-    __isset() :
-      number(false),
-      bigNumber(false),
-      decimals(false),
-      name(false) {}
-    bool number;
-    bool bigNumber;
-    bool decimals;
-    bool name;
-  } __isset;
-...
-}
-\end{verbatim}
-
-\subsection{Case Analysis}
-
-There are four cases in which version mismatches may occur.
-
-\begin{enumerate}
-\item \textit{Added field, old client, new server.} In this case, the old
-client does not send the new field. The new server recognizes that the field
-is not set, and implements default behavior for out-of-date requests.
-\item \textit{Removed field, old client, new server.} In this case, the old
-client sends the removed field. The new server simply ignores it.
-\item \textit{Added field, new client, old server.} The new client sends a
-field that the old server does not recognize. The old server simply ignores
-it and processes as normal.
-\item \textit{Removed field, new client, old server.} This is the most
-dangerous case, as the old server is unlikely to have suitable default
-behavior implemented for the missing field. It is recommended that in this
-situation the new server be rolled out prior to the new clients.
-\end{enumerate}
-
-\subsection{Protocol/Transport Versioning}
-The \texttt{TProtocol} abstractions are also designed to give protocol
-implementations the freedom to version themselves in whatever manner they
-see fit. Specifically, any protocol implementation is free to send whatever
-it likes in the \texttt{writeMessageBegin()} call. It is entirely up to the
-implementor how to handle versioning at the protocol level. The key point is
-that protocol encoding changes are safely isolated from interface definition
-version changes.
-
-Note that the exact same is true of the \texttt{TTransport} interface. For
-example, if we wished to add some new checksumming or error detection to the
-\texttt{TFileTransport}, we could simply add a version header into the
-data it writes to the file in such a way that it would still accept old
-log files without the given header.
-
-\section{RPC Implementation}
-
-\subsection{TProcessor}
-
-The last core interface in the Thrift design is the \texttt{TProcessor},
-perhaps the most simple of the constructs. The interface is as follows:
-
-\begin{verbatim}
-interface TProcessor {
-  bool process(TProtocol in, TProtocol out)
-    throws TException
-}
-\end{verbatim}
-
-The key design idea here is that the complex systems we build can fundamentally
-be broken down into agents or services that operate on inputs and outputs. In
-most cases, there is actually just one input and output (an RPC client) that
-needs handling.
-
-\subsection{Generated Code}
-
-When a service is defined, we generate a
-\texttt{TProcessor} instance capable of handling RPC requests to that service,
-using a few helpers. The fundamental structure (illustrated in pseudo-C++) is
-as follows:
-
-\begin{verbatim}
-Service.thrift
- => Service.cpp
-     interface ServiceIf
-     class ServiceClient : virtual ServiceIf
-       TProtocol in
-       TProtocol out
-     class ServiceProcessor : TProcessor
-       ServiceIf handler
-
-ServiceHandler.cpp
- class ServiceHandler : virtual ServiceIf
-
-TServer.cpp
- TServer(TProcessor processor,
-         TServerTransport transport,
-         TTransportFactory tfactory,
-         TProtocolFactory pfactory)
- serve()
-\end{verbatim}
-
-From the Thrift definition file, we generate the virtual service interface.
-A client class is generated, which implements the interface and
-uses two \texttt{TProtocol} instances to perform the I/O operations. The
-generated processor implements the \texttt{TProcessor} interface. The generated
-code has all the logic to handle RPC invocations via the \texttt{process()}
-call, and takes as a parameter an instance of the service interface, as
-implemented by the application developer.
-
-The user provides an implementation of the application interface in separate,
-non-generated source code.
-
-\subsection{TServer}
-
-Finally, the Thrift core libraries provide a \texttt{TServer} abstraction.
-The \texttt{TServer} object generally works as follows.
-
-\begin{itemize}
-\item Use the \texttt{TServerTransport} to get a \texttt{TTransport}
-\item Use the \texttt{TTransportFactory} to optionally convert the primitive
-transport into a suitable application transport (typically the
-\texttt{TBufferedTransportFactory} is used here)
-\item Use the \texttt{TProtocolFactory} to create an input and output protocol
-for the \texttt{TTransport}
-\item Invoke the \texttt{process()} method of the \texttt{TProcessor} object
-\end{itemize}
-
-The layers are appropriately separated such that the server code needs to know
-nothing about any of the transports, encodings, or applications in play. The
-server encapsulates the logic around connection handling, threading, etc.
-while the processor deals with RPC. The only code written by the application
-developer lives in the definitional Thrift file and the interface
-implementation.
-
-Facebook has deployed multiple \texttt{TServer} implementations, including
-the single-threaded \texttt{TSimpleServer}, thread-per-connection
-\texttt{TThreadedServer}, and thread-pooling \texttt{TThreadPoolServer}.
-
-The \texttt{TProcessor} interface is very general by design. There is no
-requirement that a \texttt{TServer} take a generated \texttt{TProcessor}
-object. Thrift allows the application developer to easily write any type of
-server that operates on \texttt{TProtocol} objects (for instance, a server
-could simply stream a certain type of object without any actual RPC method
-invocation).
-
-\section{Implementation Details}
-\subsection{Target Languages}
-Thrift currently supports five target languages: C++, Java, Python, Ruby, and
-PHP. At Facebook, we have deployed servers predominantly in C++, Java, and
-Python. Thrift services implemented in PHP have also been embedded into the
-Apache web server, providing transparent backend access to many of our
-frontend constructs using a \texttt{THttpClient} implementation of the
-\texttt{TTransport} interface.
-
-Though Thrift was explicitly designed to be much more efficient and robust
-than typical web technologies, as we were designing our XML-based REST web
-services API we noticed that Thrift could be easily used to define our
-service interface. Though we do not currently employ SOAP envelopes (in the
-authors' opinions there is already far too much repetitive enterprise Java
-software to do that sort of thing), we were able to quickly extend Thrift to
-generate XML Schema Definition files for our service, as well as a framework
-for versioning different implementations of our web service. Though public
-web services are admittedly tangential to Thrift's core use case and design,
-Thrift facilitated rapid iteration and affords us the ability to quickly
-migrate our entire XML-based web service onto a higher performance system
-should the need arise.
-
-\subsection{Generated Structs}
-We made a conscious decision to make our generated structs as transparent as
-possible. All fields are publicly accessible; there are no \texttt{set()} and
-\texttt{get()} methods. Similarly, use of the \texttt{isset} object is not
-enforced. We do not include any \texttt{FieldNotSetException} construct.
-Developers have the option to use these fields to write more robust code, but
-the system is robust to the developer ignoring the \texttt{isset} construct
-entirely and will provide suitable default behavior in all cases.
-
-This choice was motivated by the desire to ease application development. Our stated
-goal is not to make developers learn a rich new library in their language of
-choice, but rather to generate code that allow them to work with the constructs
-that are most familiar in each language.
-
-We also made the \texttt{read()} and \texttt{write()} methods of the generated
-objects public so that the objects can be used outside of the context
-of RPC clients and servers. Thrift is a useful tool simply for generating
-objects that are easily serializable across programming languages.
-
-\subsection{RPC Method Identification}
-Method calls in RPC are implemented by sending the method name as a string. One
-issue with this approach is that longer method names require more bandwidth.
-We experimented with using fixed-size hashes to identify methods, but in the
-end concluded that the savings were not worth the headaches incurred. Reliably
-dealing with conflicts across versions of an interface definition file is
-impossible without a meta-storage system (i.e. to generate non-conflicting
-hashes for the current version of a file, we would have to know about all
-conflicts that ever existed in any previous version of the file).
-
-We wanted to avoid too many unnecessary string comparisons upon
-method invocation. To deal with this, we generate maps from strings to function
-pointers, so that invocation is effectively accomplished via a constant-time
-hash lookup in the common case. This requires the use of a couple interesting
-code constructs. Because Java does not have function pointers, process
-functions are all private member classes implementing a common interface.
-
-\begin{verbatim}
-private class ping implements ProcessFunction {
-  public void process(int seqid,
-                      TProtocol iprot,
-                      TProtocol oprot)
-    throws TException
-  { ...}
-}
-
-HashMap<String,ProcessFunction> processMap_ =
-  new HashMap<String,ProcessFunction>();
-\end{verbatim}
-
-In C++, we use a relatively esoteric language construct: member function
-pointers.
-
-\begin{verbatim}
-std::map<std::string,
-  void (ExampleServiceProcessor::*)(int32_t,
-  facebook::thrift::protocol::TProtocol*,
-  facebook::thrift::protocol::TProtocol*)>
- processMap_;
-\end{verbatim}
-
-Using these techniques, the cost of string processing is minimized, and we
-reap the benefit of being able to easily debug corrupt or misunderstood data by
-inspecting it for known string method names.
-
-\subsection{Servers and Multithreading}
-Thrift services require basic multithreading to handle simultaneous
-requests from multiple clients. For the Python and Java implementations of
-Thrift server logic, the standard threading libraries distributed with the
-languages provide adequate support. For the C++ implementation, no standard multithread runtime
-library exists. Specifically, robust, lightweight, and portable
-thread manager and timer class implementations do not exist. We investigated
-existing implementations, namely \texttt{boost::thread},
-\texttt{boost::threadpool}, \texttt{ACE\_Thread\_Manager} and
-\texttt{ACE\_Timer}.
-
-While \texttt{boost::threads}\cite{boost.threads}  provides clean,
-lightweight and robust implementations of multi-thread primitives (mutexes,
-conditions, threads) it does not provide a thread manager or timer
-implementation.
-
-\texttt{boost::threadpool}\cite{boost.threadpool} also looked promising but
-was not far enough along for our purposes. We wanted to limit the dependency on
-third-party libraries as much as possible. Because\\
-\texttt{boost::threadpool} is
-not a pure template library and requires runtime libraries and because it is
-not yet part of the official Boost distribution we felt it was not ready for
-use in Thrift. As \texttt{boost::threadpool} evolves and especially if it is
-added to the Boost distribution we may reconsider our decision to not use it.
-
-ACE has both a thread manager and timer class in addition to multi-thread
-primitives. The biggest problem with ACE is that it is ACE. Unlike Boost, ACE
-API quality is poor. Everything in ACE has large numbers of dependencies on
-everything else in ACE - thus forcing developers to throw out standard
-classes, such as STL collections, in favor of ACE's homebrewed implementations. In
-addition, unlike Boost, ACE implementations demonstrate little understanding
-of the power and pitfalls of C++ programming and take no advantage of modern
-templating techniques to ensure compile time safety and reasonable compiler
-error messages. For all these reasons, ACE was rejected. Instead, we chose
-to implement our own library, described in the following sections.
-
-\subsection{Thread Primitives}
-
-The Thrift thread libraries are implemented in the namespace\\
-\texttt{facebook::thrift::concurrency} and have three components:
-\begin{itemize}
-\item primitives
-\item thread pool manager
-\item timer manager
-\end{itemize}
-
-As mentioned above, we were hesitant to introduce any additional dependencies
-on Thrift. We decided to use \texttt{boost::shared\_ptr} because it is so
-useful for multithreaded application, it requires no link-time or
-runtime libraries (i.e. it is a pure template library) and it is due
-to become part of the C++0x standard.
-
-We implement standard \texttt{Mutex} and \texttt{Condition} classes, and a
- \texttt{Monitor} class. The latter is simply a combination of a mutex and
-condition variable and is analogous to the \texttt{Monitor} implementation provided for
-the Java \texttt{Object} class. This is also sometimes referred to as a barrier. We
-provide a \texttt{Synchronized} guard class to allow Java-like synchronized blocks.
-This is just a bit of syntactic sugar, but, like its Java counterpart, clearly
-delimits critical sections of code. Unlike its Java counterpart, we still
-have the ability to programmatically lock, unlock, block, and signal monitors.
-
-\begin{verbatim}
-void run() {
- {Synchronized s(manager->monitor);
-  if (manager->state == TimerManager::STARTING) {
-    manager->state = TimerManager::STARTED;
-    manager->monitor.notifyAll();
-  }
- }
-}
-\end{verbatim}
-
-We again borrowed from Java the distinction between a thread and a runnable
-class. A \texttt{Thread} is the actual schedulable object. The
-\texttt{Runnable} is the logic to execute within the thread.
-The \texttt{Thread} implementation deals with all the platform-specific thread
-creation and destruction issues, while the \texttt{Runnable} implementation deals
-with the application-specific per-thread logic. The benefit of this approach
-is that developers can easily subclass the Runnable class without pulling in
-platform-specific super-classes.
-
-\subsection{Thread, Runnable, and shared\_ptr}
-We use \texttt{boost::shared\_ptr} throughout the \texttt{ThreadManager} and
-\texttt{TimerManager} implementations to guarantee cleanup of dead objects that can
-be accessed by multiple threads. For \texttt{Thread} class implementations,
-\texttt{boost::shared\_ptr} usage requires particular attention to make sure
-\texttt{Thread} objects are neither leaked nor dereferenced prematurely while
-creating and shutting down threads.
-
-Thread creation requires calling into a C library. (In our case the POSIX
-thread library, \texttt{libpthread}, but the same would be true for WIN32 threads).
-Typically, the OS makes few, if any, guarantees about when \texttt{ThreadMain}, a C thread's entry-point function, will be called. Therefore, it is
-possible that our thread create call,
-\texttt{ThreadFactory::newThread()} could return to the caller
-well before that time. To ensure that the returned \texttt{Thread} object is not
-prematurely cleaned up if the caller gives up its reference prior to the
-\texttt{ThreadMain} call, the \texttt{Thread} object makes a weak reference to
-itself in its \texttt{start} method.
-
-With the weak reference in hand the \texttt{ThreadMain} function can attempt to get
-a strong reference before entering the \texttt{Runnable::run} method of the
-\texttt{Runnable} object bound to the \texttt{Thread}. If no strong references to the
-thread are obtained between exiting \texttt{Thread::start} and entering \texttt{ThreadMain}, the weak reference returns \texttt{null} and the function
-exits immediately.
-
-The need for the \texttt{Thread} to make a weak reference to itself has a
-significant impact on the API. Since references are managed through the
-\texttt{boost::shared\_ptr} templates, the \texttt{Thread} object must have a reference
-to itself wrapped by the same \texttt{boost::shared\_ptr} envelope that is returned
-to the caller. This necessitated the use of the factory pattern.
-\texttt{ThreadFactory} creates the raw \texttt{Thread} object and a
-\texttt{boost::shared\_ptr} wrapper, and calls a private helper method of the class
-implementing the \texttt{Thread} interface (in this case, \texttt{PosixThread::weakRef})
- to allow it to make add weak reference to itself through the
- \texttt{boost::shared\_ptr} envelope.
-
-\texttt{Thread} and \texttt{Runnable} objects reference each other. A \texttt{Runnable}
-object may need to know about the thread in which it is executing, and a Thread, obviously,
-needs to know what \texttt{Runnable} object it is hosting. This interdependency is
-further complicated because the lifecycle of each object is independent of the
-other. An application may create a set of \texttt{Runnable} object to be reused in different threads, or it may create and forget a \texttt{Runnable} object
-once a thread has been created and started for it.
-
-The \texttt{Thread} class takes a \texttt{boost::shared\_ptr} reference to the hosted
-\texttt{Runnable} object in its constructor, while the \texttt{Runnable} class has an
-explicit \texttt{thread} method to allow explicit binding of the hosted thread.
-\texttt{ThreadFactory::newThread} binds the objects to each other.
-
-\subsection{ThreadManager}
-
-\texttt{ThreadManager} creates a pool of worker threads and
-allows applications to schedule tasks for execution as free worker threads
-become available. The \texttt{ThreadManager} does not implement dynamic
-thread pool resizing, but provides primitives so that applications can add
-and remove threads based on load. This approach was chosen because
-implementing load metrics and thread pool size is very application
-specific. For example some applications may want to adjust pool size based
-on running-average of work arrival rates that are measured via polled
-samples. Others may simply wish to react immediately to work-queue
-depth high and low water marks. Rather than trying to create a complex
-API abstract enough to capture these different approaches, we
-simply leave it up to the particular application and provide the
-primitives to enact the desired policy and sample current status.
-
-\subsection{TimerManager}
-
-\texttt{TimerManager} allows applications to schedule
- \texttt{Runnable} objects for execution at some point in the future. Its specific task
-is to allows applications to sample \texttt{ThreadManager} load at regular
-intervals and make changes to the thread pool size based on application policy.
-Of course, it can be used to generate any number of timer or alarm events.
-
-The default implementation of \texttt{TimerManager} uses a single thread to
-execute expired \texttt{Runnable} objects. Thus, if a timer operation needs to
-do a large amount of work and especially if it needs to do blocking I/O,
-that should be done in a separate thread.
-
-\subsection{Nonblocking Operation}
-Though the Thrift transport interfaces map more directly to a blocking I/O
-model, we have implemented a high performance \texttt{TNonBlockingServer}
-in C++ based on \texttt{libevent} and the \texttt{TFramedTransport}. We
-implemented this by moving all I/O into one tight event loop using a
-state machine. Essentially, the event loop reads framed requests into
-\texttt{TMemoryBuffer} objects. Once entire requests are ready, they are
-dispatched to the \texttt{TProcessor} object which can read directly from
-the data in memory.
-
-\subsection{Compiler}
-The Thrift compiler is implemented in C++ using standard \texttt{lex}/\texttt{yacc}
-lexing and parsing. Though it could have been implemented with fewer
-lines of code in another language (i.e. Python Lex-Yacc (PLY) or \texttt{ocamlyacc}), using C++
-forces explicit definition of the language constructs. Strongly typing the
-parse tree elements (debatably) makes the code more approachable for new
-developers.
-
-Code generation is done using two passes. The first pass looks only for
-include files and type definitions. Type definitions are not checked during
-this phase, since they may depend upon include files. All included files
-are sequentially scanned in a first pass. Once the include tree has been
-resolved, a second pass over all files is taken that inserts type definitions
-into the parse tree and raises an error on any undefined types. The program is
-then generated against the parse tree.
-
-Due to inherent complexities and potential for circular dependencies,
-we explicitly disallow forward declaration. Two Thrift structs cannot
-each contain an instance of the other. (Since we do not allow \texttt{null}
-struct instances in the generated C++ code, this would actually be impossible.)
-
-\subsection{TFileTransport}
-The \texttt{TFileTransport} logs Thrift requests/structs by
-framing incoming data with its length and writing it out to disk.
-Using a framed on-disk format allows for better error checking and
-helps with the processing of a finite number of discrete events. The\\
-\texttt{TFileWriterTransport} uses a system of swapping in-memory buffers
-to ensure good performance while logging large amounts of data.
-A Thrift log file is split up into chunks of a specified size; logged messages
-are not allowed to cross chunk boundaries. A message that would cross a chunk
-boundary will cause padding to be added until the end of the chunk and the
-first byte of the message are aligned to the beginning of the next chunk.
-Partitioning the file into chunks makes it possible to read and interpret data
-from a particular point in the file.
-
-\section{Facebook Thrift Services}
-Thrift has been employed in a large number of applications at Facebook, including
-search, logging, mobile, ads and the developer platform. Two specific usages are discussed below.
-
-\subsection{Search}
-Thrift is used as the underlying protocol and transport layer for the Facebook Search service.
-The multi-language code generation is well suited for search because it allows for application
-development in an efficient server side language (C++) and allows the Facebook PHP-based web application
-to make calls to the search service using Thrift PHP libraries. There is also a large
-variety of search stats, deployment and testing functionality that is built on top
-of generated Python code. Additionally, the Thrift log file format is
-used as a redo log for providing real-time search index updates. Thrift has allowed the
-search team to leverage each language for its strengths and to develop code at a rapid pace.
-
-\subsection{Logging}
-The Thrift \texttt{TFileTransport} functionality is used for structured logging. Each
-service function definition along with its parameters can be considered to be
-a structured log entry identified by the function name. This log can then be used for
-a variety of purposes, including inline and offline processing, stats aggregation and as a redo log.
-
-\section{Conclusions}
-Thrift has enabled Facebook to build scalable backend
-services efficiently by enabling engineers to divide and conquer. Application
-developers can focus on application code without worrying about the
-sockets layer. We avoid duplicated work by writing buffering and I/O logic
-in one place, rather than interspersing it in each application.
-
-Thrift has been employed in a wide variety of applications at Facebook,
-including search, logging, mobile, ads, and the developer platform. We have
-found that the marginal performance cost incurred by an extra layer of
-software abstraction is far eclipsed by the gains in developer efficiency and
-systems reliability.
-
-\appendix
-
-\section{Similar Systems}
-The following are software systems similar to Thrift. Each is (very!) briefly
-described:
-
-\begin{itemize}
-\item \textit{SOAP.} XML-based. Designed for web services via HTTP, excessive
-XML parsing overhead.
-\item \textit{CORBA.} Relatively comprehensive, debatably overdesigned and
-heavyweight. Comparably cumbersome software installation.
-\item \textit{COM.} Embraced mainly in Windows client software. Not an entirely
-open solution.
-\item \textit{Pillar.} Lightweight and high-performance, but missing versioning
-and abstraction.
-\item \textit{Protocol Buffers.} Closed-source, owned by Google. Described in
-Sawzall paper.
-\end{itemize}
-
-\acks
-
-Many thanks for feedback on Thrift (and extreme trial by fire) are due to
-Martin Smith, Karl Voskuil and Yishan Wong.
-
-Thrift is a successor to Pillar, a similar system developed
-by Adam D'Angelo, first while at Caltech and continued later at Facebook.
-Thrift simply would not have happened without Adam's insights.
-
-\begin{thebibliography}{}
-
-\bibitem{boost.threads}
-Kempf, William,
-``Boost.Threads'',
-\url{http://www.boost.org/doc/html/threads.html}
-
-\bibitem{boost.threadpool}
-Henkel, Philipp,
-``threadpool'',
-\url{http://threadpool.sourceforge.net}
-
-\end{thebibliography}
-
-\end{document}


[3/3] git commit: THRIFT-2450 - include HowToContribute in the src repo Client: build Patch: jfarrell

Posted by jf...@apache.org.
THRIFT-2450 - include HowToContribute in the src repo
Client: build
Patch: jfarrell

Reorganized docs and adds HowToContribute to the code base.


Project: http://git-wip-us.apache.org/repos/asf/thrift/repo
Commit: http://git-wip-us.apache.org/repos/asf/thrift/commit/347a5ebb
Tree: http://git-wip-us.apache.org/repos/asf/thrift/tree/347a5ebb
Diff: http://git-wip-us.apache.org/repos/asf/thrift/diff/347a5ebb

Branch: refs/heads/master
Commit: 347a5ebb2d16c52e523e4b1b96ce804ef18585f2
Parents: 6cf0ffc
Author: jfarrell <jf...@apache.org>
Authored: Sat Apr 5 12:20:07 2014 -0400
Committer: jfarrell <jf...@apache.org>
Committed: Sat Apr 5 12:20:07 2014 -0400

----------------------------------------------------------------------
 doc/HowToCommit.md                |   67 ++
 doc/HowToContribute.md            |   45 ++
 doc/lgpl-2.1.txt                  |  504 ---------------
 doc/licenses/lgpl-2.1.txt         |  504 +++++++++++++++
 doc/licenses/otp-base-license.txt |   20 +
 doc/otp-base-license.txt          |   20 -
 doc/specs/thrift-protocol-spec.md |   96 +++
 doc/specs/thrift-sasl-spec.txt    |  108 ++++
 doc/specs/thrift.tex              | 1057 ++++++++++++++++++++++++++++++++
 doc/thrift-sasl-spec.txt          |  108 ----
 doc/thrift.bnf                    |   96 ---
 doc/thrift.tex                    | 1057 --------------------------------
 12 files changed, 1897 insertions(+), 1785 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/HowToCommit.md
----------------------------------------------------------------------
diff --git a/doc/HowToCommit.md b/doc/HowToCommit.md
new file mode 100644
index 0000000..4606da5
--- /dev/null
+++ b/doc/HowToCommit.md
@@ -0,0 +1,67 @@
+## Process used by committers to review and submit patches
+
+1. Make sure that there is an issue for the patch(s) you are about to commit in our [Jira issue tracker]({{ conf.jira_url }})
+ 
+1. Check out the latest version of the source code
+	
+	* git clone https://git-wip-us.apache.org/repos/asf/thrift.git thrift 
+
+1. Apply the patch
+	
+	* curl https://issues.apache.org/jira/... |git apply --ignore-space-change
+	
+	or
+	
+	* curl https://github.com/<GitHub User>/thrift/commit/<Commit ID>.patch |git apply --ignore-space-change
+	
+	
+1. Inspect the applied patch to ensure that all [Legal aspects on Submission of Contributions (Patches)](http://www.apache.org/licenses/LICENSE-2.0.html#contributions) are met
+
+1. Run the necessary unit tests and cross language test cases to verify the patch
+
+1. Commit the patch
+
+		git --config user.name "Your Name"
+		git --config user.email "YourApacheID@apache.org"
+		git add -A
+		git commit
+		
+		
+1. The commit message should be in the format:
+	
+		THRIFT-###:<Jira description>
+		Client: <component>
+		Patch: <Name of person contributing the patch>
+		
+		Description of what was fixed or addressed.
+		
+		<%
+			if this is a github pull request then copy the below block 
+			from the GitHub email that came to dev@ list, this will 
+			automatically close the GitHub pull request 
+		%>
+		Github Pull Request: This closes #XX
+		----
+		commit 1234567
+		Author: docbrown <do...@example.com>
+		Date:   1985-06-03T01:21:00Z
+
+    		fix for THRIFT-1234
+
+    		fix for THRIFT-1234 fixes the flux capacitor
+
+
+1. Double check the patch committed and that nothing was missed then push the patch
+
+		git status
+		git show HEAD
+		git push origin master
+
+		
+1. Resolve the jira issue and set the following for the changelog
+
+	* Component the patch is for  
+	* fixVersion to the current version on master
+	
+
+ 

http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/HowToContribute.md
----------------------------------------------------------------------
diff --git a/doc/HowToContribute.md b/doc/HowToContribute.md
new file mode 100644
index 0000000..3b538a2
--- /dev/null
+++ b/doc/HowToContribute.md
@@ -0,0 +1,45 @@
+## How to contribute
+
+ 1. Make sure your issue is not all ready in the [Jira issue tracker]({{ conf.jira_url }})
+ 1. If not, create a ticket describing the change you're proposing in the [Jira issue tracker]({{ conf.jira_url }})
+ 1. Contribute your patch using one of the two methods below
+ 
+### Contributing via a patch
+ 
+1. Check out the latest version of the source code
+	
+	* git clone https://git-wip-us.apache.org/repos/asf/thrift.git thrift 
+
+1. Modify the source to include the improvement/bugfix
+	
+	* Verify that you follow the same CodingStyle you see within the language you are working on
+	* Verify that your change works by adding a unit test.
+
+1. Create a patch from project root directory (e.g. you@dev:~/thrift $ ):
+	
+	* git diff > ../thrift-XXX-my-new-feature.patch
+
+1. Attach the newly generated patch to the issue
+1. Wait for other contributors or committers to review your new addition
+1. Wait for a committer to commit your patch
+ 
+### Contributing via GitHub pull requests
+
+1. Create a fork for http://github.com/apache/thrift
+1. Create a branch with the jira ticket number you are working on
+1. Modify the source to include the improvement/bugfix
+	
+	* Verify that you follow the same CodingStyle you see within the language you are working on
+	* Verify that your change works by adding a unit test. 
+
+1. Issue a pull request for your new feature
+1. Wait for other contributors or committers to review your new addition
+1. Wait for a committer to commit your patch
+
+### More info
+ 
+ Plenty of information on why and how to contribute is available on the Apache Software Foundation (ASF) web site. In particular, we recommend the following:
+ 
+ * [Contributors Tech Guide](http://www.apache.org/dev/contributors)
+ * [Get involved!](http://www.apache.org/foundation/getinvolved.html)
+ * [Legal aspects on Submission of Contributions (Patches)](http://www.apache.org/licenses/LICENSE-2.0.html#contributions)

http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/lgpl-2.1.txt
----------------------------------------------------------------------
diff --git a/doc/lgpl-2.1.txt b/doc/lgpl-2.1.txt
deleted file mode 100644
index 5ab7695..0000000
--- a/doc/lgpl-2.1.txt
+++ /dev/null
@@ -1,504 +0,0 @@
-		  GNU LESSER GENERAL PUBLIC LICENSE
-		       Version 2.1, February 1999
-
- Copyright (C) 1991, 1999 Free Software Foundation, Inc.
- 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
- Everyone is permitted to copy and distribute verbatim copies
- of this license document, but changing it is not allowed.
-
-[This is the first released version of the Lesser GPL.  It also counts
- as the successor of the GNU Library Public License, version 2, hence
- the version number 2.1.]
-
-			    Preamble
-
-  The licenses for most software are designed to take away your
-freedom to share and change it.  By contrast, the GNU General Public
-Licenses are intended to guarantee your freedom to share and change
-free software--to make sure the software is free for all its users.
-
-  This license, the Lesser General Public License, applies to some
-specially designated software packages--typically libraries--of the
-Free Software Foundation and other authors who decide to use it.  You
-can use it too, but we suggest you first think carefully about whether
-this license or the ordinary General Public License is the better
-strategy to use in any particular case, based on the explanations below.
-
-  When we speak of free software, we are referring to freedom of use,
-not price.  Our General Public Licenses are designed to make sure that
-you have the freedom to distribute copies of free software (and charge
-for this service if you wish); that you receive source code or can get
-it if you want it; that you can change the software and use pieces of
-it in new free programs; and that you are informed that you can do
-these things.
-
-  To protect your rights, we need to make restrictions that forbid
-distributors to deny you these rights or to ask you to surrender these
-rights.  These restrictions translate to certain responsibilities for
-you if you distribute copies of the library or if you modify it.
-
-  For example, if you distribute copies of the library, whether gratis
-or for a fee, you must give the recipients all the rights that we gave
-you.  You must make sure that they, too, receive or can get the source
-code.  If you link other code with the library, you must provide
-complete object files to the recipients, so that they can relink them
-with the library after making changes to the library and recompiling
-it.  And you must show them these terms so they know their rights.
-
-  We protect your rights with a two-step method: (1) we copyright the
-library, and (2) we offer you this license, which gives you legal
-permission to copy, distribute and/or modify the library.
-
-  To protect each distributor, we want to make it very clear that
-there is no warranty for the free library.  Also, if the library is
-modified by someone else and passed on, the recipients should know
-that what they have is not the original version, so that the original
-author's reputation will not be affected by problems that might be
-introduced by others.
-
-  Finally, software patents pose a constant threat to the existence of
-any free program.  We wish to make sure that a company cannot
-effectively restrict the users of a free program by obtaining a
-restrictive license from a patent holder.  Therefore, we insist that
-any patent license obtained for a version of the library must be
-consistent with the full freedom of use specified in this license.
-
-  Most GNU software, including some libraries, is covered by the
-ordinary GNU General Public License.  This license, the GNU Lesser
-General Public License, applies to certain designated libraries, and
-is quite different from the ordinary General Public License.  We use
-this license for certain libraries in order to permit linking those
-libraries into non-free programs.
-
-  When a program is linked with a library, whether statically or using
-a shared library, the combination of the two is legally speaking a
-combined work, a derivative of the original library.  The ordinary
-General Public License therefore permits such linking only if the
-entire combination fits its criteria of freedom.  The Lesser General
-Public License permits more lax criteria for linking other code with
-the library.
-
-  We call this license the "Lesser" General Public License because it
-does Less to protect the user's freedom than the ordinary General
-Public License.  It also provides other free software developers Less
-of an advantage over competing non-free programs.  These disadvantages
-are the reason we use the ordinary General Public License for many
-libraries.  However, the Lesser license provides advantages in certain
-special circumstances.
-
-  For example, on rare occasions, there may be a special need to
-encourage the widest possible use of a certain library, so that it becomes
-a de-facto standard.  To achieve this, non-free programs must be
-allowed to use the library.  A more frequent case is that a free
-library does the same job as widely used non-free libraries.  In this
-case, there is little to gain by limiting the free library to free
-software only, so we use the Lesser General Public License.
-
-  In other cases, permission to use a particular library in non-free
-programs enables a greater number of people to use a large body of
-free software.  For example, permission to use the GNU C Library in
-non-free programs enables many more people to use the whole GNU
-operating system, as well as its variant, the GNU/Linux operating
-system.
-
-  Although the Lesser General Public License is Less protective of the
-users' freedom, it does ensure that the user of a program that is
-linked with the Library has the freedom and the wherewithal to run
-that program using a modified version of the Library.
-
-  The precise terms and conditions for copying, distribution and
-modification follow.  Pay close attention to the difference between a
-"work based on the library" and a "work that uses the library".  The
-former contains code derived from the library, whereas the latter must
-be combined with the library in order to run.
-
-		  GNU LESSER GENERAL PUBLIC LICENSE
-   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
-
-  0. This License Agreement applies to any software library or other
-program which contains a notice placed by the copyright holder or
-other authorized party saying it may be distributed under the terms of
-this Lesser General Public License (also called "this License").
-Each licensee is addressed as "you".
-
-  A "library" means a collection of software functions and/or data
-prepared so as to be conveniently linked with application programs
-(which use some of those functions and data) to form executables.
-
-  The "Library", below, refers to any such software library or work
-which has been distributed under these terms.  A "work based on the
-Library" means either the Library or any derivative work under
-copyright law: that is to say, a work containing the Library or a
-portion of it, either verbatim or with modifications and/or translated
-straightforwardly into another language.  (Hereinafter, translation is
-included without limitation in the term "modification".)
-
-  "Source code" for a work means the preferred form of the work for
-making modifications to it.  For a library, complete source code means
-all the source code for all modules it contains, plus any associated
-interface definition files, plus the scripts used to control compilation
-and installation of the library.
-
-  Activities other than copying, distribution and modification are not
-covered by this License; they are outside its scope.  The act of
-running a program using the Library is not restricted, and output from
-such a program is covered only if its contents constitute a work based
-on the Library (independent of the use of the Library in a tool for
-writing it).  Whether that is true depends on what the Library does
-and what the program that uses the Library does.
-  
-  1. You may copy and distribute verbatim copies of the Library's
-complete source code as you receive it, in any medium, provided that
-you conspicuously and appropriately publish on each copy an
-appropriate copyright notice and disclaimer of warranty; keep intact
-all the notices that refer to this License and to the absence of any
-warranty; and distribute a copy of this License along with the
-Library.
-
-  You may charge a fee for the physical act of transferring a copy,
-and you may at your option offer warranty protection in exchange for a
-fee.
-
-  2. You may modify your copy or copies of the Library or any portion
-of it, thus forming a work based on the Library, and copy and
-distribute such modifications or work under the terms of Section 1
-above, provided that you also meet all of these conditions:
-
-    a) The modified work must itself be a software library.
-
-    b) You must cause the files modified to carry prominent notices
-    stating that you changed the files and the date of any change.
-
-    c) You must cause the whole of the work to be licensed at no
-    charge to all third parties under the terms of this License.
-
-    d) If a facility in the modified Library refers to a function or a
-    table of data to be supplied by an application program that uses
-    the facility, other than as an argument passed when the facility
-    is invoked, then you must make a good faith effort to ensure that,
-    in the event an application does not supply such function or
-    table, the facility still operates, and performs whatever part of
-    its purpose remains meaningful.
-
-    (For example, a function in a library to compute square roots has
-    a purpose that is entirely well-defined independent of the
-    application.  Therefore, Subsection 2d requires that any
-    application-supplied function or table used by this function must
-    be optional: if the application does not supply it, the square
-    root function must still compute square roots.)
-
-These requirements apply to the modified work as a whole.  If
-identifiable sections of that work are not derived from the Library,
-and can be reasonably considered independent and separate works in
-themselves, then this License, and its terms, do not apply to those
-sections when you distribute them as separate works.  But when you
-distribute the same sections as part of a whole which is a work based
-on the Library, the distribution of the whole must be on the terms of
-this License, whose permissions for other licensees extend to the
-entire whole, and thus to each and every part regardless of who wrote
-it.
-
-Thus, it is not the intent of this section to claim rights or contest
-your rights to work written entirely by you; rather, the intent is to
-exercise the right to control the distribution of derivative or
-collective works based on the Library.
-
-In addition, mere aggregation of another work not based on the Library
-with the Library (or with a work based on the Library) on a volume of
-a storage or distribution medium does not bring the other work under
-the scope of this License.
-
-  3. You may opt to apply the terms of the ordinary GNU General Public
-License instead of this License to a given copy of the Library.  To do
-this, you must alter all the notices that refer to this License, so
-that they refer to the ordinary GNU General Public License, version 2,
-instead of to this License.  (If a newer version than version 2 of the
-ordinary GNU General Public License has appeared, then you can specify
-that version instead if you wish.)  Do not make any other change in
-these notices.
-
-  Once this change is made in a given copy, it is irreversible for
-that copy, so the ordinary GNU General Public License applies to all
-subsequent copies and derivative works made from that copy.
-
-  This option is useful when you wish to copy part of the code of
-the Library into a program that is not a library.
-
-  4. You may copy and distribute the Library (or a portion or
-derivative of it, under Section 2) in object code or executable form
-under the terms of Sections 1 and 2 above provided that you accompany
-it with the complete corresponding machine-readable source code, which
-must be distributed under the terms of Sections 1 and 2 above on a
-medium customarily used for software interchange.
-
-  If distribution of object code is made by offering access to copy
-from a designated place, then offering equivalent access to copy the
-source code from the same place satisfies the requirement to
-distribute the source code, even though third parties are not
-compelled to copy the source along with the object code.
-
-  5. A program that contains no derivative of any portion of the
-Library, but is designed to work with the Library by being compiled or
-linked with it, is called a "work that uses the Library".  Such a
-work, in isolation, is not a derivative work of the Library, and
-therefore falls outside the scope of this License.
-
-  However, linking a "work that uses the Library" with the Library
-creates an executable that is a derivative of the Library (because it
-contains portions of the Library), rather than a "work that uses the
-library".  The executable is therefore covered by this License.
-Section 6 states terms for distribution of such executables.
-
-  When a "work that uses the Library" uses material from a header file
-that is part of the Library, the object code for the work may be a
-derivative work of the Library even though the source code is not.
-Whether this is true is especially significant if the work can be
-linked without the Library, or if the work is itself a library.  The
-threshold for this to be true is not precisely defined by law.
-
-  If such an object file uses only numerical parameters, data
-structure layouts and accessors, and small macros and small inline
-functions (ten lines or less in length), then the use of the object
-file is unrestricted, regardless of whether it is legally a derivative
-work.  (Executables containing this object code plus portions of the
-Library will still fall under Section 6.)
-
-  Otherwise, if the work is a derivative of the Library, you may
-distribute the object code for the work under the terms of Section 6.
-Any executables containing that work also fall under Section 6,
-whether or not they are linked directly with the Library itself.
-
-  6. As an exception to the Sections above, you may also combine or
-link a "work that uses the Library" with the Library to produce a
-work containing portions of the Library, and distribute that work
-under terms of your choice, provided that the terms permit
-modification of the work for the customer's own use and reverse
-engineering for debugging such modifications.
-
-  You must give prominent notice with each copy of the work that the
-Library is used in it and that the Library and its use are covered by
-this License.  You must supply a copy of this License.  If the work
-during execution displays copyright notices, you must include the
-copyright notice for the Library among them, as well as a reference
-directing the user to the copy of this License.  Also, you must do one
-of these things:
-
-    a) Accompany the work with the complete corresponding
-    machine-readable source code for the Library including whatever
-    changes were used in the work (which must be distributed under
-    Sections 1 and 2 above); and, if the work is an executable linked
-    with the Library, with the complete machine-readable "work that
-    uses the Library", as object code and/or source code, so that the
-    user can modify the Library and then relink to produce a modified
-    executable containing the modified Library.  (It is understood
-    that the user who changes the contents of definitions files in the
-    Library will not necessarily be able to recompile the application
-    to use the modified definitions.)
-
-    b) Use a suitable shared library mechanism for linking with the
-    Library.  A suitable mechanism is one that (1) uses at run time a
-    copy of the library already present on the user's computer system,
-    rather than copying library functions into the executable, and (2)
-    will operate properly with a modified version of the library, if
-    the user installs one, as long as the modified version is
-    interface-compatible with the version that the work was made with.
-
-    c) Accompany the work with a written offer, valid for at
-    least three years, to give the same user the materials
-    specified in Subsection 6a, above, for a charge no more
-    than the cost of performing this distribution.
-
-    d) If distribution of the work is made by offering access to copy
-    from a designated place, offer equivalent access to copy the above
-    specified materials from the same place.
-
-    e) Verify that the user has already received a copy of these
-    materials or that you have already sent this user a copy.
-
-  For an executable, the required form of the "work that uses the
-Library" must include any data and utility programs needed for
-reproducing the executable from it.  However, as a special exception,
-the materials to be distributed need not include anything that is
-normally distributed (in either source or binary form) with the major
-components (compiler, kernel, and so on) of the operating system on
-which the executable runs, unless that component itself accompanies
-the executable.
-
-  It may happen that this requirement contradicts the license
-restrictions of other proprietary libraries that do not normally
-accompany the operating system.  Such a contradiction means you cannot
-use both them and the Library together in an executable that you
-distribute.
-
-  7. You may place library facilities that are a work based on the
-Library side-by-side in a single library together with other library
-facilities not covered by this License, and distribute such a combined
-library, provided that the separate distribution of the work based on
-the Library and of the other library facilities is otherwise
-permitted, and provided that you do these two things:
-
-    a) Accompany the combined library with a copy of the same work
-    based on the Library, uncombined with any other library
-    facilities.  This must be distributed under the terms of the
-    Sections above.
-
-    b) Give prominent notice with the combined library of the fact
-    that part of it is a work based on the Library, and explaining
-    where to find the accompanying uncombined form of the same work.
-
-  8. You may not copy, modify, sublicense, link with, or distribute
-the Library except as expressly provided under this License.  Any
-attempt otherwise to copy, modify, sublicense, link with, or
-distribute the Library is void, and will automatically terminate your
-rights under this License.  However, parties who have received copies,
-or rights, from you under this License will not have their licenses
-terminated so long as such parties remain in full compliance.
-
-  9. You are not required to accept this License, since you have not
-signed it.  However, nothing else grants you permission to modify or
-distribute the Library or its derivative works.  These actions are
-prohibited by law if you do not accept this License.  Therefore, by
-modifying or distributing the Library (or any work based on the
-Library), you indicate your acceptance of this License to do so, and
-all its terms and conditions for copying, distributing or modifying
-the Library or works based on it.
-
-  10. Each time you redistribute the Library (or any work based on the
-Library), the recipient automatically receives a license from the
-original licensor to copy, distribute, link with or modify the Library
-subject to these terms and conditions.  You may not impose any further
-restrictions on the recipients' exercise of the rights granted herein.
-You are not responsible for enforcing compliance by third parties with
-this License.
-
-  11. If, as a consequence of a court judgment or allegation of patent
-infringement or for any other reason (not limited to patent issues),
-conditions are imposed on you (whether by court order, agreement or
-otherwise) that contradict the conditions of this License, they do not
-excuse you from the conditions of this License.  If you cannot
-distribute so as to satisfy simultaneously your obligations under this
-License and any other pertinent obligations, then as a consequence you
-may not distribute the Library at all.  For example, if a patent
-license would not permit royalty-free redistribution of the Library by
-all those who receive copies directly or indirectly through you, then
-the only way you could satisfy both it and this License would be to
-refrain entirely from distribution of the Library.
-
-If any portion of this section is held invalid or unenforceable under any
-particular circumstance, the balance of the section is intended to apply,
-and the section as a whole is intended to apply in other circumstances.
-
-It is not the purpose of this section to induce you to infringe any
-patents or other property right claims or to contest validity of any
-such claims; this section has the sole purpose of protecting the
-integrity of the free software distribution system which is
-implemented by public license practices.  Many people have made
-generous contributions to the wide range of software distributed
-through that system in reliance on consistent application of that
-system; it is up to the author/donor to decide if he or she is willing
-to distribute software through any other system and a licensee cannot
-impose that choice.
-
-This section is intended to make thoroughly clear what is believed to
-be a consequence of the rest of this License.
-
-  12. If the distribution and/or use of the Library is restricted in
-certain countries either by patents or by copyrighted interfaces, the
-original copyright holder who places the Library under this License may add
-an explicit geographical distribution limitation excluding those countries,
-so that distribution is permitted only in or among countries not thus
-excluded.  In such case, this License incorporates the limitation as if
-written in the body of this License.
-
-  13. The Free Software Foundation may publish revised and/or new
-versions of the Lesser General Public License from time to time.
-Such new versions will be similar in spirit to the present version,
-but may differ in detail to address new problems or concerns.
-
-Each version is given a distinguishing version number.  If the Library
-specifies a version number of this License which applies to it and
-"any later version", you have the option of following the terms and
-conditions either of that version or of any later version published by
-the Free Software Foundation.  If the Library does not specify a
-license version number, you may choose any version ever published by
-the Free Software Foundation.
-
-  14. If you wish to incorporate parts of the Library into other free
-programs whose distribution conditions are incompatible with these,
-write to the author to ask for permission.  For software which is
-copyrighted by the Free Software Foundation, write to the Free
-Software Foundation; we sometimes make exceptions for this.  Our
-decision will be guided by the two goals of preserving the free status
-of all derivatives of our free software and of promoting the sharing
-and reuse of software generally.
-
-			    NO WARRANTY
-
-  15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
-WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
-EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
-OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
-KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
-LIBRARY IS WITH YOU.  SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
-THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
-
-  16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
-WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
-AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
-FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
-CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
-LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
-RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
-FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
-SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
-DAMAGES.
-
-		     END OF TERMS AND CONDITIONS
-
-           How to Apply These Terms to Your New Libraries
-
-  If you develop a new library, and you want it to be of the greatest
-possible use to the public, we recommend making it free software that
-everyone can redistribute and change.  You can do so by permitting
-redistribution under these terms (or, alternatively, under the terms of the
-ordinary General Public License).
-
-  To apply these terms, attach the following notices to the library.  It is
-safest to attach them to the start of each source file to most effectively
-convey the exclusion of warranty; and each file should have at least the
-"copyright" line and a pointer to where the full notice is found.
-
-    <one line to give the library's name and a brief idea of what it does.>
-    Copyright (C) <year>  <name of author>
-
-    This library is free software; you can redistribute it and/or
-    modify it under the terms of the GNU Lesser General Public
-    License as published by the Free Software Foundation; either
-    version 2.1 of the License, or (at your option) any later version.
-
-    This library is distributed in the hope that it will be useful,
-    but WITHOUT ANY WARRANTY; without even the implied warranty of
-    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-    Lesser General Public License for more details.
-
-    You should have received a copy of the GNU Lesser General Public
-    License along with this library; if not, write to the Free Software
-    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
-
-Also add information on how to contact you by electronic and paper mail.
-
-You should also get your employer (if you work as a programmer) or your
-school, if any, to sign a "copyright disclaimer" for the library, if
-necessary.  Here is a sample; alter the names:
-
-  Yoyodyne, Inc., hereby disclaims all copyright interest in the
-  library `Frob' (a library for tweaking knobs) written by James Random Hacker.
-
-  <signature of Ty Coon>, 1 April 1990
-  Ty Coon, President of Vice
-
-That's all there is to it!
-
-

http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/licenses/lgpl-2.1.txt
----------------------------------------------------------------------
diff --git a/doc/licenses/lgpl-2.1.txt b/doc/licenses/lgpl-2.1.txt
new file mode 100644
index 0000000..5ab7695
--- /dev/null
+++ b/doc/licenses/lgpl-2.1.txt
@@ -0,0 +1,504 @@
+		  GNU LESSER GENERAL PUBLIC LICENSE
+		       Version 2.1, February 1999
+
+ Copyright (C) 1991, 1999 Free Software Foundation, Inc.
+ 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+[This is the first released version of the Lesser GPL.  It also counts
+ as the successor of the GNU Library Public License, version 2, hence
+ the version number 2.1.]
+
+			    Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+Licenses are intended to guarantee your freedom to share and change
+free software--to make sure the software is free for all its users.
+
+  This license, the Lesser General Public License, applies to some
+specially designated software packages--typically libraries--of the
+Free Software Foundation and other authors who decide to use it.  You
+can use it too, but we suggest you first think carefully about whether
+this license or the ordinary General Public License is the better
+strategy to use in any particular case, based on the explanations below.
+
+  When we speak of free software, we are referring to freedom of use,
+not price.  Our General Public Licenses are designed to make sure that
+you have the freedom to distribute copies of free software (and charge
+for this service if you wish); that you receive source code or can get
+it if you want it; that you can change the software and use pieces of
+it in new free programs; and that you are informed that you can do
+these things.
+
+  To protect your rights, we need to make restrictions that forbid
+distributors to deny you these rights or to ask you to surrender these
+rights.  These restrictions translate to certain responsibilities for
+you if you distribute copies of the library or if you modify it.
+
+  For example, if you distribute copies of the library, whether gratis
+or for a fee, you must give the recipients all the rights that we gave
+you.  You must make sure that they, too, receive or can get the source
+code.  If you link other code with the library, you must provide
+complete object files to the recipients, so that they can relink them
+with the library after making changes to the library and recompiling
+it.  And you must show them these terms so they know their rights.
+
+  We protect your rights with a two-step method: (1) we copyright the
+library, and (2) we offer you this license, which gives you legal
+permission to copy, distribute and/or modify the library.
+
+  To protect each distributor, we want to make it very clear that
+there is no warranty for the free library.  Also, if the library is
+modified by someone else and passed on, the recipients should know
+that what they have is not the original version, so that the original
+author's reputation will not be affected by problems that might be
+introduced by others.
+
+  Finally, software patents pose a constant threat to the existence of
+any free program.  We wish to make sure that a company cannot
+effectively restrict the users of a free program by obtaining a
+restrictive license from a patent holder.  Therefore, we insist that
+any patent license obtained for a version of the library must be
+consistent with the full freedom of use specified in this license.
+
+  Most GNU software, including some libraries, is covered by the
+ordinary GNU General Public License.  This license, the GNU Lesser
+General Public License, applies to certain designated libraries, and
+is quite different from the ordinary General Public License.  We use
+this license for certain libraries in order to permit linking those
+libraries into non-free programs.
+
+  When a program is linked with a library, whether statically or using
+a shared library, the combination of the two is legally speaking a
+combined work, a derivative of the original library.  The ordinary
+General Public License therefore permits such linking only if the
+entire combination fits its criteria of freedom.  The Lesser General
+Public License permits more lax criteria for linking other code with
+the library.
+
+  We call this license the "Lesser" General Public License because it
+does Less to protect the user's freedom than the ordinary General
+Public License.  It also provides other free software developers Less
+of an advantage over competing non-free programs.  These disadvantages
+are the reason we use the ordinary General Public License for many
+libraries.  However, the Lesser license provides advantages in certain
+special circumstances.
+
+  For example, on rare occasions, there may be a special need to
+encourage the widest possible use of a certain library, so that it becomes
+a de-facto standard.  To achieve this, non-free programs must be
+allowed to use the library.  A more frequent case is that a free
+library does the same job as widely used non-free libraries.  In this
+case, there is little to gain by limiting the free library to free
+software only, so we use the Lesser General Public License.
+
+  In other cases, permission to use a particular library in non-free
+programs enables a greater number of people to use a large body of
+free software.  For example, permission to use the GNU C Library in
+non-free programs enables many more people to use the whole GNU
+operating system, as well as its variant, the GNU/Linux operating
+system.
+
+  Although the Lesser General Public License is Less protective of the
+users' freedom, it does ensure that the user of a program that is
+linked with the Library has the freedom and the wherewithal to run
+that program using a modified version of the Library.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.  Pay close attention to the difference between a
+"work based on the library" and a "work that uses the library".  The
+former contains code derived from the library, whereas the latter must
+be combined with the library in order to run.
+
+		  GNU LESSER GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License Agreement applies to any software library or other
+program which contains a notice placed by the copyright holder or
+other authorized party saying it may be distributed under the terms of
+this Lesser General Public License (also called "this License").
+Each licensee is addressed as "you".
+
+  A "library" means a collection of software functions and/or data
+prepared so as to be conveniently linked with application programs
+(which use some of those functions and data) to form executables.
+
+  The "Library", below, refers to any such software library or work
+which has been distributed under these terms.  A "work based on the
+Library" means either the Library or any derivative work under
+copyright law: that is to say, a work containing the Library or a
+portion of it, either verbatim or with modifications and/or translated
+straightforwardly into another language.  (Hereinafter, translation is
+included without limitation in the term "modification".)
+
+  "Source code" for a work means the preferred form of the work for
+making modifications to it.  For a library, complete source code means
+all the source code for all modules it contains, plus any associated
+interface definition files, plus the scripts used to control compilation
+and installation of the library.
+
+  Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running a program using the Library is not restricted, and output from
+such a program is covered only if its contents constitute a work based
+on the Library (independent of the use of the Library in a tool for
+writing it).  Whether that is true depends on what the Library does
+and what the program that uses the Library does.
+  
+  1. You may copy and distribute verbatim copies of the Library's
+complete source code as you receive it, in any medium, provided that
+you conspicuously and appropriately publish on each copy an
+appropriate copyright notice and disclaimer of warranty; keep intact
+all the notices that refer to this License and to the absence of any
+warranty; and distribute a copy of this License along with the
+Library.
+
+  You may charge a fee for the physical act of transferring a copy,
+and you may at your option offer warranty protection in exchange for a
+fee.
+
+  2. You may modify your copy or copies of the Library or any portion
+of it, thus forming a work based on the Library, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) The modified work must itself be a software library.
+
+    b) You must cause the files modified to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    c) You must cause the whole of the work to be licensed at no
+    charge to all third parties under the terms of this License.
+
+    d) If a facility in the modified Library refers to a function or a
+    table of data to be supplied by an application program that uses
+    the facility, other than as an argument passed when the facility
+    is invoked, then you must make a good faith effort to ensure that,
+    in the event an application does not supply such function or
+    table, the facility still operates, and performs whatever part of
+    its purpose remains meaningful.
+
+    (For example, a function in a library to compute square roots has
+    a purpose that is entirely well-defined independent of the
+    application.  Therefore, Subsection 2d requires that any
+    application-supplied function or table used by this function must
+    be optional: if the application does not supply it, the square
+    root function must still compute square roots.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Library,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Library, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote
+it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Library.
+
+In addition, mere aggregation of another work not based on the Library
+with the Library (or with a work based on the Library) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may opt to apply the terms of the ordinary GNU General Public
+License instead of this License to a given copy of the Library.  To do
+this, you must alter all the notices that refer to this License, so
+that they refer to the ordinary GNU General Public License, version 2,
+instead of to this License.  (If a newer version than version 2 of the
+ordinary GNU General Public License has appeared, then you can specify
+that version instead if you wish.)  Do not make any other change in
+these notices.
+
+  Once this change is made in a given copy, it is irreversible for
+that copy, so the ordinary GNU General Public License applies to all
+subsequent copies and derivative works made from that copy.
+
+  This option is useful when you wish to copy part of the code of
+the Library into a program that is not a library.
+
+  4. You may copy and distribute the Library (or a portion or
+derivative of it, under Section 2) in object code or executable form
+under the terms of Sections 1 and 2 above provided that you accompany
+it with the complete corresponding machine-readable source code, which
+must be distributed under the terms of Sections 1 and 2 above on a
+medium customarily used for software interchange.
+
+  If distribution of object code is made by offering access to copy
+from a designated place, then offering equivalent access to copy the
+source code from the same place satisfies the requirement to
+distribute the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  5. A program that contains no derivative of any portion of the
+Library, but is designed to work with the Library by being compiled or
+linked with it, is called a "work that uses the Library".  Such a
+work, in isolation, is not a derivative work of the Library, and
+therefore falls outside the scope of this License.
+
+  However, linking a "work that uses the Library" with the Library
+creates an executable that is a derivative of the Library (because it
+contains portions of the Library), rather than a "work that uses the
+library".  The executable is therefore covered by this License.
+Section 6 states terms for distribution of such executables.
+
+  When a "work that uses the Library" uses material from a header file
+that is part of the Library, the object code for the work may be a
+derivative work of the Library even though the source code is not.
+Whether this is true is especially significant if the work can be
+linked without the Library, or if the work is itself a library.  The
+threshold for this to be true is not precisely defined by law.
+
+  If such an object file uses only numerical parameters, data
+structure layouts and accessors, and small macros and small inline
+functions (ten lines or less in length), then the use of the object
+file is unrestricted, regardless of whether it is legally a derivative
+work.  (Executables containing this object code plus portions of the
+Library will still fall under Section 6.)
+
+  Otherwise, if the work is a derivative of the Library, you may
+distribute the object code for the work under the terms of Section 6.
+Any executables containing that work also fall under Section 6,
+whether or not they are linked directly with the Library itself.
+
+  6. As an exception to the Sections above, you may also combine or
+link a "work that uses the Library" with the Library to produce a
+work containing portions of the Library, and distribute that work
+under terms of your choice, provided that the terms permit
+modification of the work for the customer's own use and reverse
+engineering for debugging such modifications.
+
+  You must give prominent notice with each copy of the work that the
+Library is used in it and that the Library and its use are covered by
+this License.  You must supply a copy of this License.  If the work
+during execution displays copyright notices, you must include the
+copyright notice for the Library among them, as well as a reference
+directing the user to the copy of this License.  Also, you must do one
+of these things:
+
+    a) Accompany the work with the complete corresponding
+    machine-readable source code for the Library including whatever
+    changes were used in the work (which must be distributed under
+    Sections 1 and 2 above); and, if the work is an executable linked
+    with the Library, with the complete machine-readable "work that
+    uses the Library", as object code and/or source code, so that the
+    user can modify the Library and then relink to produce a modified
+    executable containing the modified Library.  (It is understood
+    that the user who changes the contents of definitions files in the
+    Library will not necessarily be able to recompile the application
+    to use the modified definitions.)
+
+    b) Use a suitable shared library mechanism for linking with the
+    Library.  A suitable mechanism is one that (1) uses at run time a
+    copy of the library already present on the user's computer system,
+    rather than copying library functions into the executable, and (2)
+    will operate properly with a modified version of the library, if
+    the user installs one, as long as the modified version is
+    interface-compatible with the version that the work was made with.
+
+    c) Accompany the work with a written offer, valid for at
+    least three years, to give the same user the materials
+    specified in Subsection 6a, above, for a charge no more
+    than the cost of performing this distribution.
+
+    d) If distribution of the work is made by offering access to copy
+    from a designated place, offer equivalent access to copy the above
+    specified materials from the same place.
+
+    e) Verify that the user has already received a copy of these
+    materials or that you have already sent this user a copy.
+
+  For an executable, the required form of the "work that uses the
+Library" must include any data and utility programs needed for
+reproducing the executable from it.  However, as a special exception,
+the materials to be distributed need not include anything that is
+normally distributed (in either source or binary form) with the major
+components (compiler, kernel, and so on) of the operating system on
+which the executable runs, unless that component itself accompanies
+the executable.
+
+  It may happen that this requirement contradicts the license
+restrictions of other proprietary libraries that do not normally
+accompany the operating system.  Such a contradiction means you cannot
+use both them and the Library together in an executable that you
+distribute.
+
+  7. You may place library facilities that are a work based on the
+Library side-by-side in a single library together with other library
+facilities not covered by this License, and distribute such a combined
+library, provided that the separate distribution of the work based on
+the Library and of the other library facilities is otherwise
+permitted, and provided that you do these two things:
+
+    a) Accompany the combined library with a copy of the same work
+    based on the Library, uncombined with any other library
+    facilities.  This must be distributed under the terms of the
+    Sections above.
+
+    b) Give prominent notice with the combined library of the fact
+    that part of it is a work based on the Library, and explaining
+    where to find the accompanying uncombined form of the same work.
+
+  8. You may not copy, modify, sublicense, link with, or distribute
+the Library except as expressly provided under this License.  Any
+attempt otherwise to copy, modify, sublicense, link with, or
+distribute the Library is void, and will automatically terminate your
+rights under this License.  However, parties who have received copies,
+or rights, from you under this License will not have their licenses
+terminated so long as such parties remain in full compliance.
+
+  9. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Library or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Library (or any work based on the
+Library), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Library or works based on it.
+
+  10. Each time you redistribute the Library (or any work based on the
+Library), the recipient automatically receives a license from the
+original licensor to copy, distribute, link with or modify the Library
+subject to these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties with
+this License.
+
+  11. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Library at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Library by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Library.
+
+If any portion of this section is held invalid or unenforceable under any
+particular circumstance, the balance of the section is intended to apply,
+and the section as a whole is intended to apply in other circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  12. If the distribution and/or use of the Library is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Library under this License may add
+an explicit geographical distribution limitation excluding those countries,
+so that distribution is permitted only in or among countries not thus
+excluded.  In such case, this License incorporates the limitation as if
+written in the body of this License.
+
+  13. The Free Software Foundation may publish revised and/or new
+versions of the Lesser General Public License from time to time.
+Such new versions will be similar in spirit to the present version,
+but may differ in detail to address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Library
+specifies a version number of this License which applies to it and
+"any later version", you have the option of following the terms and
+conditions either of that version or of any later version published by
+the Free Software Foundation.  If the Library does not specify a
+license version number, you may choose any version ever published by
+the Free Software Foundation.
+
+  14. If you wish to incorporate parts of the Library into other free
+programs whose distribution conditions are incompatible with these,
+write to the author to ask for permission.  For software which is
+copyrighted by the Free Software Foundation, write to the Free
+Software Foundation; we sometimes make exceptions for this.  Our
+decision will be guided by the two goals of preserving the free status
+of all derivatives of our free software and of promoting the sharing
+and reuse of software generally.
+
+			    NO WARRANTY
+
+  15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
+WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
+EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
+OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
+KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
+LIBRARY IS WITH YOU.  SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
+THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
+WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
+AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
+FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
+CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
+LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
+RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
+FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
+SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
+DAMAGES.
+
+		     END OF TERMS AND CONDITIONS
+
+           How to Apply These Terms to Your New Libraries
+
+  If you develop a new library, and you want it to be of the greatest
+possible use to the public, we recommend making it free software that
+everyone can redistribute and change.  You can do so by permitting
+redistribution under these terms (or, alternatively, under the terms of the
+ordinary General Public License).
+
+  To apply these terms, attach the following notices to the library.  It is
+safest to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least the
+"copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the library's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This library is free software; you can redistribute it and/or
+    modify it under the terms of the GNU Lesser General Public
+    License as published by the Free Software Foundation; either
+    version 2.1 of the License, or (at your option) any later version.
+
+    This library is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+    You should have received a copy of the GNU Lesser General Public
+    License along with this library; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+
+Also add information on how to contact you by electronic and paper mail.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the library, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the
+  library `Frob' (a library for tweaking knobs) written by James Random Hacker.
+
+  <signature of Ty Coon>, 1 April 1990
+  Ty Coon, President of Vice
+
+That's all there is to it!
+
+

http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/licenses/otp-base-license.txt
----------------------------------------------------------------------
diff --git a/doc/licenses/otp-base-license.txt b/doc/licenses/otp-base-license.txt
new file mode 100644
index 0000000..8ee2992
--- /dev/null
+++ b/doc/licenses/otp-base-license.txt
@@ -0,0 +1,20 @@
+Tue Oct 24 12:28:44 CDT 2006
+
+Copyright (c) <2006> <Martin J. Logan, Erlware> 
+
+Permission is hereby granted, free of charge, to any person obtaining a copy 
+of this software (OTP Base, fslib, G.A.S)  and associated documentation files (the "Software"), to deal 
+in the Software without restriction, including without limitation the rights to 
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 
+of the Software, and to permit persons to whom the Software is furnished to do 
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all 
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 
+INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 
+PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE 
+OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/otp-base-license.txt
----------------------------------------------------------------------
diff --git a/doc/otp-base-license.txt b/doc/otp-base-license.txt
deleted file mode 100644
index 8ee2992..0000000
--- a/doc/otp-base-license.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-Tue Oct 24 12:28:44 CDT 2006
-
-Copyright (c) <2006> <Martin J. Logan, Erlware> 
-
-Permission is hereby granted, free of charge, to any person obtaining a copy 
-of this software (OTP Base, fslib, G.A.S)  and associated documentation files (the "Software"), to deal 
-in the Software without restriction, including without limitation the rights to 
-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 
-of the Software, and to permit persons to whom the Software is furnished to do 
-so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all 
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 
-INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 
-PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 
-HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 
-OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE 
-OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/specs/thrift-protocol-spec.md
----------------------------------------------------------------------
diff --git a/doc/specs/thrift-protocol-spec.md b/doc/specs/thrift-protocol-spec.md
new file mode 100644
index 0000000..24d83f6
--- /dev/null
+++ b/doc/specs/thrift-protocol-spec.md
@@ -0,0 +1,96 @@
+Thrift Protocol Structure
+
+Last Modified: 2007-Jun-29
+
+--------------------------------------------------------------------
+
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+
+--------------------------------------------------------------------
+
+This document describes the structure of the Thrift protocol
+without specifying the encoding. Thus, the order of elements
+could in some cases be rearranged depending upon the TProtocol
+implementation, but this document specifies the minimum required
+structure. There are some "dumb" terminals like STRING and INT
+that take the place of an actual encoding specification.
+
+They key point to notice is that ALL messages are just one wrapped
+<struct>. Depending upon the message type, the <struct> can be
+interpreted as the argument list to a function, the return value
+of a function, or an exception.
+
+--------------------------------------------------------------------
+
+       <message> ::= <message-begin> <struct> <message-end>
+
+ <message-begin> ::= <method-name> <message-type> <message-seqid>
+
+   <method-name> ::= STRING
+
+  <message-type> ::= T_CALL | T_REPLY | T_EXCEPTION
+
+ <message-seqid> ::= I32
+
+        <struct> ::= <struct-begin> <field>* <field-stop> <struct-end>
+
+  <struct-begin> ::= <struct-name>
+
+   <struct-name> ::= STRING
+
+    <field-stop> ::= T_STOP
+
+         <field> ::= <field-begin> <field-data> <field-end>
+
+   <field-begin> ::= <field-name> <field-type> <field-id>
+
+    <field-name> ::= STRING
+
+    <field-type> ::= T_BOOL | T_BYTE | T_I8 | T_I16 | T_I32 | T_I64 | T_DOUBLE
+                     | T_STRING | T_BINARY | T_STRUCT | T_MAP | T_SET | T_LIST
+
+      <field-id> ::= I16
+
+    <field-data> ::= I8 | I16 | I32 | I64 | DOUBLE | STRING | BINARY
+                     <struct> | <map> | <list> | <set>
+
+           <map> ::= <map-begin> <field-datum>* <map-end>
+
+     <map-begin> ::= <map-key-type> <map-value-type> <map-size>
+
+  <map-key-type> ::= <field-type>
+
+<map-value-type> ::= <field-type>
+
+      <map-size> ::= I32
+
+          <list> ::= <list-begin> <field-data>* <list-end>
+
+    <list-begin> ::= <list-elem-type> <list-size>
+
+<list-elem-type> ::= <field-type>
+
+     <list-size> ::= I32
+
+           <set> ::= <set-begin> <field-data>* <set-end>
+
+     <set-begin> ::= <set-elem-type> <set-size>
+
+ <set-elem-type> ::= <field-type>
+
+      <set-size> ::= I32

http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/specs/thrift-sasl-spec.txt
----------------------------------------------------------------------
diff --git a/doc/specs/thrift-sasl-spec.txt b/doc/specs/thrift-sasl-spec.txt
new file mode 100644
index 0000000..02cf79e
--- /dev/null
+++ b/doc/specs/thrift-sasl-spec.txt
@@ -0,0 +1,108 @@
+A Thrift SASL message shall be a byte array of the following form:
+
+| 1-byte status code | 4-byte payload length | variable-length payload |
+
+The length fields shall be interpreted as integers, with the high byte sent
+first. This indicates the length of the field immediately following it, not
+including the status code or the length bytes.
+
+The possible status codes are:
+
+0x01 - START - Hello, let's go on a date.
+0x02 - OK - Everything's been going alright so far, let's see each other again.
+0x03 - BAD - I understand what you're saying. I really do. I just don't like it. We have to break up.
+0x04 - ERROR - We can't go on like this. It's like you're speaking another language.
+0x05 - COMPLETE - Will you marry me?
+
+The Thrift SASL communication will proceed as follows:
+
+1. The client is configured at instantiation of the transport with a single
+underlying SASL security mechanism that it supports.
+
+2. The server is configured with a mapping of underlying security mechanism
+name -> mechanism options.
+
+3. At connection time, the client will initiate communication by sending the
+server a START message. The payload of this message will be the name of the
+underlying security mechanism that the client would like to use.
+This mechanism name shall be 1-20 characters in length, and follow the
+specifications for SASL mechanism names specified in RFC 2222.
+
+4. The server receives this message and, if the mechanism name provided is
+among the set of mechanisms this server transport is configured to accept,
+appropriate initialization of the underlying security mechanism may take place.
+If the mechanism name is not one which the server is configured to support, the
+server shall return the BAD byte, followed by a 4-byte, potentially zero-value
+message length, followed by the potentially zero-length payload which may be a
+status code or message indicating failure. No further communication may take
+place via this transport. If the mechanism name is one which the server
+supports, then proceed to step 5.
+
+5. Following the START message, the client must send another message containing
+the "initial response" of the chosen SASL implementation. The client may send
+this message piggy-backed on the "START" message of step 3. The message type
+of this message must be either "OK" or "COMPLETE", depending on whether the
+SASL implementation indicates that this side of the authentication has been
+satisfied.
+
+6. The server then provides the byte array of the payload received to its
+underlying security mechanism. A challenge is generated by the underlying
+security mechanism on the server, and this is used as the payload for a message
+sent to the client. This message shall consist of an OK byte, followed by the
+non-zero message length word, followed by the payload.
+
+7. The client receives this message from the server and passes the payload to
+its underlying security mechanism to generate a response. The client then sends
+the server an OK byte, followed by the non-zero-value length of the response,
+followed by the bytes of the response as the payload.
+
+8. Steps 6 and 7 are repeated until both security mechanisms are satisfied with
+the challenge/response exchange. When either side has completed its security
+protocol, its next message shall be the COMPLETE byte, followed by a 4-byte
+potentially zero-value length word, followed by a potentially zero-length
+payload. This payload will be empty except for those underlying security
+mechanisms which provide additional data with success.
+
+If at any point in time either side is able to interpret the challenge or
+response sent by the other, but is dissatisfied with the contents thereof, this
+side should send the other a BAD byte, followed by a 4-byte potentially
+zero-value length word, followed by an optional, potentially zero-length
+message encoded in UTF-8 indicating failure. This message should be passed to
+the protocol above the thrift transport by whatever mechanism is appropriate
+and idiomatic for the particular language these thrift bindings are for.
+
+If at any point in time either side fails to interpret the challenge or
+response sent by the other, this side should send the other an ERROR byte,
+followed by a 4-byte potentially zero-value length word, followed by an
+optional, potentially zero-length message encoded in UTF-8. This message should
+be passed to the protocol above the thrift transport by whatever mechanism is
+appropriate and idiomatic for the particular language these thrift bindings are
+for.
+
+If step 8 completes successfully, then the communication is considered
+authenticated and subsequent communication may commence.
+
+If step 8 fails to complete successfully, then no further communication may
+take place via this transport.
+
+8. All writes to the underlying transport must be prefixed by the 4-byte length
+of the payload data, followed by the payload. All reads from this transport
+should read the 4-byte length word, then read the full quantity of bytes
+specified by this length word.
+
+If no SASL QOP (quality of protection) is negotiated during steps 6 and 7, then
+all subsequent writes to/reads from this transport are written/read unaltered,
+save for the length prefix, to the underlying transport.
+
+If a SASL QOP is negotiated, then this must be used by the Thrift transport for
+all subsequent communication. This is done by wrapping subsequent writes to the
+transport using the underlying security mechanism, and unwrapping subsequent
+reads from the underlying transport. Note that in this case, the length prefix
+of the write to the underlying transport is the length of the data after it has
+been wrapped by the underlying security mechanism. Note that the complete
+message must be read before giving this data to the underlying security
+mechanism for unwrapping.
+
+If at any point in time reading of a message fails either because of a
+malformed length word or failure to unwrap by the underlying security
+mechanism, then all further communication on this transport must cease.


[2/3] THRIFT-2450 - include HowToContribute in the src repo Client: build Patch: jfarrell

Posted by jf...@apache.org.
http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/specs/thrift.tex
----------------------------------------------------------------------
diff --git a/doc/specs/thrift.tex b/doc/specs/thrift.tex
new file mode 100644
index 0000000..a706fcb
--- /dev/null
+++ b/doc/specs/thrift.tex
@@ -0,0 +1,1057 @@
+%-----------------------------------------------------------------------------
+%
+%               Thrift whitepaper
+%
+% Name:         thrift.tex
+%
+% Authors:      Mark Slee (mcslee@facebook.com)
+%
+% Created:      05 March 2007
+%
+% You will need a copy of sigplanconf.cls to format this document.
+% It is available at <http://www.sigplan.org/authorInformation.htm>.
+%
+%-----------------------------------------------------------------------------
+
+
+\documentclass[nocopyrightspace,blockstyle]{sigplanconf}
+
+\usepackage{amssymb}
+\usepackage{amsfonts}
+\usepackage{amsmath}
+\usepackage{url}
+
+\begin{document}
+
+% \conferenceinfo{WXYZ '05}{date, City.}
+% \copyrightyear{2007}
+% \copyrightdata{[to be supplied]}
+
+% \titlebanner{banner above paper title}        % These are ignored unless
+% \preprintfooter{short description of paper}   % 'preprint' option specified.
+
+\title{Thrift: Scalable Cross-Language Services Implementation}
+\subtitle{}
+
+\authorinfo{Mark Slee, Aditya Agarwal and Marc Kwiatkowski}
+           {Facebook, 156 University Ave, Palo Alto, CA}
+           {\{mcslee,aditya,marc\}@facebook.com}
+
+\maketitle
+
+\begin{abstract}
+Thrift is a software library and set of code-generation tools developed at
+Facebook to expedite development and implementation of efficient and scalable
+backend services. Its primary goal is to enable efficient and reliable
+communication across programming languages by abstracting the portions of each
+language that tend to require the most customization into a common library
+that is implemented in each language. Specifically, Thrift allows developers to
+define datatypes and service interfaces in a single language-neutral file
+and generate all the necessary code to build RPC clients and servers.
+
+This paper details the motivations and design choices we made in Thrift, as
+well as some of the more interesting implementation details. It is not
+intended to be taken as research, but rather it is an exposition on what we did
+and why.
+\end{abstract}
+
+% \category{D.3.3}{Programming Languages}{Language constructs and features}
+
+%\terms
+%Languages, serialization, remote procedure call
+
+%\keywords
+%Data description language, interface definition language, remote procedure call
+
+\section{Introduction}
+As Facebook's traffic and network structure have scaled, the resource
+demands of many operations on the site (i.e. search,
+ad selection and delivery, event logging) have presented technical requirements
+drastically outside the scope of the LAMP framework. In our implementation of
+these services, various programming languages have been selected to
+optimize for the right combination of performance, ease and speed of
+development, availability of existing libraries, etc. By and large,
+Facebook's engineering culture has tended towards choosing the best
+tools and implementations available over standardizing on any one
+programming language and begrudgingly accepting its inherent limitations.
+
+Given this design choice, we were presented with the challenge of building
+a transparent, high-performance bridge across many programming languages.
+We found that most available solutions were either too limited, did not offer
+sufficient datatype freedom, or suffered from subpar performance.
+\footnote{See Appendix A for a discussion of alternative systems.}
+
+The solution that we have implemented combines a language-neutral software
+stack implemented across numerous programming languages and an associated code
+generation engine that transforms a simple interface and data definition
+language into client and server remote procedure call libraries.
+Choosing static code generation over a dynamic system allows us to create
+validated code that can be run without the need for
+any advanced introspective run-time type checking. It is also designed to
+be as simple as possible for the developer, who can typically define all
+the necessary data structures and interfaces for a complex service in a single
+short file.
+
+Surprised that a robust open solution to these relatively common problems
+did not yet exist, we committed early on to making the Thrift implementation
+open source.
+
+In evaluating the challenges of cross-language interaction in a networked
+environment, some key components were identified:
+
+\textit{Types.} A common type system must exist across programming languages
+without requiring that the application developer use custom Thrift datatypes
+or write their own serialization code. That is,
+a C++ programmer should be able to transparently exchange a strongly typed
+STL map for a dynamic Python dictionary. Neither
+programmer should be forced to write any code below the application layer
+to achieve this. Section 2 details the Thrift type system.
+
+\textit{Transport.} Each language must have a common interface to
+bidirectional raw data transport. The specifics of how a given
+transport is implemented should not matter to the service developer.
+The same application code should be able to run against TCP stream sockets,
+raw data in memory, or files on disk. Section 3 details the Thrift Transport
+layer.
+
+\textit{Protocol.} Datatypes must have some way of using the Transport
+layer to encode and decode themselves. Again, the application
+developer need not be concerned by this layer. Whether the service uses
+an XML or binary protocol is immaterial to the application code.
+All that matters is that the data can be read and written in a consistent,
+deterministic matter. Section 4 details the Thrift Protocol layer.
+
+\textit{Versioning.} For robust services, the involved datatypes must
+provide a mechanism for versioning themselves. Specifically,
+it should be possible to add or remove fields in an object or alter the
+argument list of a function without any interruption in service (or,
+worse yet, nasty segmentation faults). Section 5 details Thrift's versioning
+system.
+
+\textit{Processors.} Finally, we generate code capable of processing data
+streams to accomplish remote procedure calls. Section 6 details the generated
+code and TProcessor paradigm.
+
+Section 7 discusses implementation details, and Section 8 describes
+our conclusions.
+
+\section{Types}
+
+The goal of the Thrift type system is to enable programmers to develop using
+completely natively defined types, no matter what programming language they
+use. By design, the Thrift type system does not introduce any special dynamic
+types or wrapper objects. It also does not require that the developer write
+any code for object serialization or transport. The Thrift IDL (Interface
+Definition Language) file is
+logically a way for developers to annotate their data structures with the
+minimal amount of extra information necessary to tell a code generator
+how to safely transport the objects across languages.
+
+\subsection{Base Types}
+
+The type system rests upon a few base types. In considering which types to
+support, we aimed for clarity and simplicity over abundance, focusing
+on the key types available in all programming languages, omitting any
+niche types available only in specific languages.
+
+The base types supported by Thrift are:
+\begin{itemize}
+\item \texttt{bool} A boolean value, true or false
+\item \texttt{byte} A signed byte
+\item \texttt{i16} A 16-bit signed integer
+\item \texttt{i32} A 32-bit signed integer
+\item \texttt{i64} A 64-bit signed integer
+\item \texttt{double} A 64-bit floating point number
+\item \texttt{string} An encoding-agnostic text or binary string
+\item \texttt{binary} A byte array representation for blobs
+\end{itemize}
+
+Of particular note is the absence of unsigned integer types. Because these
+types have no direct translation to native primitive types in many languages,
+the advantages they afford are lost. Further, there is no way to prevent the
+application developer in a language like Python from assigning a negative value
+to an integer variable, leading to unpredictable behavior. From a design
+standpoint, we observed that unsigned integers were very rarely, if ever, used
+for arithmetic purposes, but in practice were much more often used as keys or
+identifiers. In this case, the sign is irrelevant. Signed integers serve this
+same purpose and can be safely cast to their unsigned counterparts (most
+commonly in C++) when absolutely necessary.
+
+\subsection{Structs}
+
+A Thrift struct defines a common object to be used across languages. A struct
+is essentially equivalent to a class in object oriented programming
+languages. A struct has a set of strongly typed fields, each with a unique
+name identifier. The basic syntax for defining a Thrift struct looks very
+similar to a C struct definition. Fields may be annotated with an integer field
+identifier (unique to the scope of that struct) and optional default values.
+Field identifiers will be automatically assigned if omitted, though they are
+strongly encouraged for versioning reasons discussed later.
+
+\subsection{Containers}
+
+Thrift containers are strongly typed containers that map to the most commonly
+used containers in common programming languages. They are annotated using
+the C++ template (or Java Generics) style. There are three types available:
+\begin{itemize}
+\item \texttt{list<type>} An ordered list of elements. Translates directly into
+an STL \texttt{vector}, Java \texttt{ArrayList}, or native array in scripting languages. May
+contain duplicates.
+\item \texttt{set<type>} An unordered set of unique elements. Translates into
+an STL \texttt{set}, Java \texttt{HashSet}, \texttt{set} in Python, or native
+dictionary in PHP/Ruby.
+\item \texttt{map<type1,type2>} A map of strictly unique keys to values
+Translates into an STL \texttt{map}, Java \texttt{HashMap}, PHP associative
+array, or Python/Ruby dictionary.
+\end{itemize}
+
+While defaults are provided, the type mappings are not explicitly fixed. Custom
+code generator directives have been added to substitute custom types in
+destination languages (i.e.
+\texttt{hash\_map} or Google's sparse hash map can be used in C++). The
+only requirement is that the custom types support all the necessary iteration
+primitives. Container elements may be of any valid Thrift type, including other
+containers or structs.
+
+\begin{verbatim}
+struct Example {
+  1:i32 number=10,
+  2:i64 bigNumber,
+  3:double decimals,
+  4:string name="thrifty"
+}\end{verbatim}
+
+In the target language, each definition generates a type with two methods,
+\texttt{read} and \texttt{write}, which perform serialization and transport
+of the objects using a Thrift TProtocol object.
+
+\subsection{Exceptions}
+
+Exceptions are syntactically and functionally equivalent to structs except
+that they are declared using the \texttt{exception} keyword instead of the
+\texttt{struct} keyword.
+
+The generated objects inherit from an exception base class as appropriate
+in each target programming language, in order to seamlessly
+integrate with native exception handling in any given
+language. Again, the design emphasis is on making the code familiar to the
+application developer.
+
+\subsection{Services}
+
+Services are defined using Thrift types. Definition of a service is
+semantically equivalent to defining an interface (or a pure virtual abstract
+class) in object oriented
+programming. The Thrift compiler generates fully functional client and
+server stubs that implement the interface. Services are defined as follows:
+
+\begin{verbatim}
+service <name> {
+  <returntype> <name>(<arguments>)
+    [throws (<exceptions>)]
+  ...
+}\end{verbatim}
+
+An example:
+
+\begin{verbatim}
+service StringCache {
+  void set(1:i32 key, 2:string value),
+  string get(1:i32 key) throws (1:KeyNotFound knf),
+  void delete(1:i32 key)
+}
+\end{verbatim}
+
+Note that \texttt{void} is a valid type for a function return, in addition to
+all other defined Thrift types. Additionally, an \texttt{async} modifier
+keyword may be added to a \texttt{void} function, which will generate code that does
+not wait for a response from the server. Note that a pure \texttt{void}
+function will return a response to the client which guarantees that the
+operation has completed on the server side. With \texttt{async} method calls
+the client will only be guaranteed that the request succeeded at the
+transport layer. (In many transport scenarios this is inherently unreliable
+due to the Byzantine Generals' Problem. Therefore, application developers
+should take care only to use the async optimization in cases where dropped
+method calls are acceptable or the transport is known to be reliable.)
+
+Also of note is the fact that argument lists and exception lists for functions
+are implemented as Thrift structs. All three constructs are identical in both
+notation and behavior.
+
+\section{Transport}
+
+The transport layer is used by the generated code to facilitate data transfer.
+
+\subsection{Interface}
+
+A key design choice in the implementation of Thrift was to decouple the
+transport layer from the code generation layer. Though Thrift is typically
+used on top of the TCP/IP stack with streaming sockets as the base layer of
+communication, there was no compelling reason to build that constraint into
+the system. The performance tradeoff incurred by an abstracted I/O layer
+(roughly one virtual method lookup / function call per operation) was
+immaterial compared to the cost of actual I/O operations (typically invoking
+system calls).
+
+Fundamentally, generated Thrift code only needs to know how to read and
+write data. The origin and destination of the data are irrelevant; it may be a
+socket, a segment of shared memory, or a file on the local disk. The Thrift
+transport interface supports the following methods:
+
+\begin{itemize}
+\item \texttt{open} Opens the transport
+\item \texttt{close} Closes the transport
+\item \texttt{isOpen} Indicates whether the transport is open
+\item \texttt{read} Reads from the transport
+\item \texttt{write} Writes to the transport
+\item \texttt{flush} Forces any pending writes
+\end{itemize}
+
+There are a few additional methods not documented here which are used to aid
+in batching reads and optionally signaling the completion of a read or
+write operation from the generated code.
+
+In addition to the above
+\texttt{TTransport} interface, there is a\\
+\texttt{TServerTransport} interface
+used to accept or create primitive transport objects. Its interface is as
+follows:
+
+\begin{itemize}
+\item \texttt{open} Opens the transport
+\item \texttt{listen} Begins listening for connections
+\item \texttt{accept} Returns a new client transport
+\item \texttt{close} Closes the transport
+\end{itemize}
+
+\subsection{Implementation}
+
+The transport interface is designed for simple implementation in any
+programming language. New transport mechanisms can be easily defined as needed
+by application developers.
+
+\subsubsection{TSocket}
+
+The \texttt{TSocket} class is implemented across all target languages. It
+provides a common, simple interface to a TCP/IP stream socket.
+
+\subsubsection{TFileTransport}
+
+The \texttt{TFileTransport} is an abstraction of an on-disk file to a data
+stream. It can be used to write out a set of incoming Thrift requests to a file
+on disk. The on-disk data can then be replayed from the log, either for
+post-processing or for reproduction and/or simulation of past events.
+
+\subsubsection{Utilities}
+
+The Transport interface is designed to support easy extension using common
+OOP techniques, such as composition. Some simple utilities include the
+\texttt{TBufferedTransport}, which buffers the writes and reads on an
+underlying transport, the \texttt{TFramedTransport}, which transmits data with frame
+size headers for chunking optimization or nonblocking operation, and the
+\texttt{TMemoryBuffer}, which allows reading and writing directly from the heap
+or stack memory owned by the process.
+
+\section{Protocol}
+
+A second major abstraction in Thrift is the separation of data structure from
+transport representation. Thrift enforces a certain messaging structure when
+transporting data, but it is agnostic to the protocol encoding in use. That is,
+it does not matter whether data is encoded as XML, human-readable ASCII, or a
+dense binary format as long as the data supports a fixed set of operations
+that allow it to be deterministically read and written by generated code.
+
+\subsection{Interface}
+
+The Thrift Protocol interface is very straightforward. It fundamentally
+supports two things: 1) bidirectional sequenced messaging, and
+2) encoding of base types, containers, and structs.
+
+\begin{verbatim}
+writeMessageBegin(name, type, seq)
+writeMessageEnd()
+writeStructBegin(name)
+writeStructEnd()
+writeFieldBegin(name, type, id)
+writeFieldEnd()
+writeFieldStop()
+writeMapBegin(ktype, vtype, size)
+writeMapEnd()
+writeListBegin(etype, size)
+writeListEnd()
+writeSetBegin(etype, size)
+writeSetEnd()
+writeBool(bool)
+writeByte(byte)
+writeI16(i16)
+writeI32(i32)
+writeI64(i64)
+writeDouble(double)
+writeString(string)
+
+name, type, seq = readMessageBegin()
+                  readMessageEnd()
+name =            readStructBegin()
+                  readStructEnd()
+name, type, id =  readFieldBegin()
+                  readFieldEnd()
+k, v, size =      readMapBegin()
+                  readMapEnd()
+etype, size =     readListBegin()
+                  readListEnd()
+etype, size =     readSetBegin()
+                  readSetEnd()
+bool =            readBool()
+byte =            readByte()
+i16 =             readI16()
+i32 =             readI32()
+i64 =             readI64()
+double =          readDouble()
+string =          readString()
+\end{verbatim}
+
+Note that every \texttt{write} function has exactly one \texttt{read} counterpart, with
+the exception of \texttt{writeFieldStop()}. This is a special method
+that signals the end of a struct. The procedure for reading a struct is to
+\texttt{readFieldBegin()} until the stop field is encountered, and then to
+\texttt{readStructEnd()}.  The
+generated code relies upon this call sequence to ensure that everything written by
+a protocol encoder can be read by a matching protocol decoder. Further note
+that this set of functions is by design more robust than necessary.
+For example, \texttt{writeStructEnd()} is not strictly necessary, as the end of
+a struct may be implied by the stop field. This method is a convenience for
+verbose protocols in which it is cleaner to separate these calls (e.g. a closing
+\texttt{</struct>} tag in XML).
+
+\subsection{Structure}
+
+Thrift structures are designed to support encoding into a streaming
+protocol. The implementation should never need to frame or compute the
+entire data length of a structure prior to encoding it. This is critical to
+performance in many scenarios. Consider a long list of relatively large
+strings. If the protocol interface required reading or writing a list to be an
+atomic operation, then the implementation would need to perform a linear pass over the
+entire list before encoding any data. However, if the list can be written
+as iteration is performed, the corresponding read may begin in parallel,
+theoretically offering an end-to-end speedup of $(kN - C)$, where $N$ is the size
+of the list, $k$ the cost factor associated with serializing a single
+element, and $C$ is fixed offset for the delay between data being written
+and becoming available to read.
+
+Similarly, structs do not encode their data lengths a priori. Instead, they are
+encoded as a sequence of fields, with each field having a type specifier and a
+unique field identifier. Note that the inclusion of type specifiers allows
+the protocol to be safely parsed and decoded without any generated code
+or access to the original IDL file. Structs are terminated by a field header
+with a special \texttt{STOP} type. Because all the basic types can be read
+deterministically, all structs (even those containing other structs) can be
+read deterministically. The Thrift protocol is self-delimiting without any
+framing and regardless of the encoding format.
+
+In situations where streaming is unnecessary or framing is advantageous, it
+can be very simply added into the transport layer, using the
+\texttt{TFramedTransport} abstraction.
+
+\subsection{Implementation}
+
+Facebook has implemented and deployed a space-efficient binary protocol which
+is used by most backend services. Essentially, it writes all data
+in a flat binary format. Integer types are converted to network byte order,
+strings are prepended with their byte length, and all message and field headers
+are written using the primitive integer serialization constructs. String names
+for fields are omitted - when using generated code, field identifiers are
+sufficient.
+
+We decided against some extreme storage optimizations (i.e. packing
+small integers into ASCII or using a 7-bit continuation format) for the sake
+of simplicity and clarity in the code. These alterations can easily be made
+if and when we encounter a performance-critical use case that demands them.
+
+\section{Versioning}
+
+Thrift is robust in the face of versioning and data definition changes. This
+is critical to enable staged rollouts of changes to deployed services. The
+system must be able to support reading of old data from log files, as well as
+requests from out-of-date clients to new servers, and vice versa.
+
+\subsection{Field Identifiers}
+
+Versioning in Thrift is implemented via field identifiers. The field header
+for every member of a struct in Thrift is encoded with a unique field
+identifier. The combination of this field identifier and its type specifier
+is used to uniquely identify the field. The Thrift definition language
+supports automatic assignment of field identifiers, but it is good
+programming practice to always explicitly specify field identifiers.
+Identifiers are specified as follows:
+
+\begin{verbatim}
+struct Example {
+  1:i32 number=10,
+  2:i64 bigNumber,
+  3:double decimals,
+  4:string name="thrifty"
+}\end{verbatim}
+
+To avoid conflicts between manually and automatically assigned identifiers,
+fields with identifiers omitted are assigned identifiers
+decrementing from -1, and the language only supports the manual assignment of
+positive identifiers.
+
+When data is being deserialized, the generated code can use these identifiers
+to properly identify the field and determine whether it aligns with a field in
+its definition file. If a field identifier is not recognized, the generated
+code can use the type specifier to skip the unknown field without any error.
+Again, this is possible due to the fact that all datatypes are self
+delimiting.
+
+Field identifiers can (and should) also be specified in function argument
+lists. In fact, argument lists are not only represented as structs on the
+backend, but actually share the same code in the compiler frontend. This
+allows for version-safe modification of method parameters
+
+\begin{verbatim}
+service StringCache {
+  void set(1:i32 key, 2:string value),
+  string get(1:i32 key) throws (1:KeyNotFound knf),
+  void delete(1:i32 key)
+}
+\end{verbatim}
+
+The syntax for specifying field identifiers was chosen to echo their structure.
+Structs can be thought of as a dictionary where the identifiers are keys, and
+the values are strongly-typed named fields.
+
+Field identifiers internally use the \texttt{i16} Thrift type. Note, however,
+that the \texttt{TProtocol} abstraction may encode identifiers in any format.
+
+\subsection{Isset}
+
+When an unexpected field is encountered, it can be safely ignored and
+discarded. When an expected field is not found, there must be some way to
+signal to the developer that it was not present. This is implemented via an
+inner \texttt{isset} structure inside the defined objects. (Isset functionality
+is implicit with a \texttt{null} value in PHP, \texttt{None} in Python
+and \texttt{nil} in Ruby.) Essentially,
+the inner \texttt{isset} object of each Thrift struct contains a boolean value
+for each field which denotes whether or not that field is present in the
+struct. When a reader receives a struct, it should check for a field being set
+before operating directly on it.
+
+\begin{verbatim}
+class Example {
+ public:
+  Example() :
+    number(10),
+    bigNumber(0),
+    decimals(0),
+    name("thrifty") {}
+
+  int32_t number;
+  int64_t bigNumber;
+  double decimals;
+  std::string name;
+
+  struct __isset {
+    __isset() :
+      number(false),
+      bigNumber(false),
+      decimals(false),
+      name(false) {}
+    bool number;
+    bool bigNumber;
+    bool decimals;
+    bool name;
+  } __isset;
+...
+}
+\end{verbatim}
+
+\subsection{Case Analysis}
+
+There are four cases in which version mismatches may occur.
+
+\begin{enumerate}
+\item \textit{Added field, old client, new server.} In this case, the old
+client does not send the new field. The new server recognizes that the field
+is not set, and implements default behavior for out-of-date requests.
+\item \textit{Removed field, old client, new server.} In this case, the old
+client sends the removed field. The new server simply ignores it.
+\item \textit{Added field, new client, old server.} The new client sends a
+field that the old server does not recognize. The old server simply ignores
+it and processes as normal.
+\item \textit{Removed field, new client, old server.} This is the most
+dangerous case, as the old server is unlikely to have suitable default
+behavior implemented for the missing field. It is recommended that in this
+situation the new server be rolled out prior to the new clients.
+\end{enumerate}
+
+\subsection{Protocol/Transport Versioning}
+The \texttt{TProtocol} abstractions are also designed to give protocol
+implementations the freedom to version themselves in whatever manner they
+see fit. Specifically, any protocol implementation is free to send whatever
+it likes in the \texttt{writeMessageBegin()} call. It is entirely up to the
+implementor how to handle versioning at the protocol level. The key point is
+that protocol encoding changes are safely isolated from interface definition
+version changes.
+
+Note that the exact same is true of the \texttt{TTransport} interface. For
+example, if we wished to add some new checksumming or error detection to the
+\texttt{TFileTransport}, we could simply add a version header into the
+data it writes to the file in such a way that it would still accept old
+log files without the given header.
+
+\section{RPC Implementation}
+
+\subsection{TProcessor}
+
+The last core interface in the Thrift design is the \texttt{TProcessor},
+perhaps the most simple of the constructs. The interface is as follows:
+
+\begin{verbatim}
+interface TProcessor {
+  bool process(TProtocol in, TProtocol out)
+    throws TException
+}
+\end{verbatim}
+
+The key design idea here is that the complex systems we build can fundamentally
+be broken down into agents or services that operate on inputs and outputs. In
+most cases, there is actually just one input and output (an RPC client) that
+needs handling.
+
+\subsection{Generated Code}
+
+When a service is defined, we generate a
+\texttt{TProcessor} instance capable of handling RPC requests to that service,
+using a few helpers. The fundamental structure (illustrated in pseudo-C++) is
+as follows:
+
+\begin{verbatim}
+Service.thrift
+ => Service.cpp
+     interface ServiceIf
+     class ServiceClient : virtual ServiceIf
+       TProtocol in
+       TProtocol out
+     class ServiceProcessor : TProcessor
+       ServiceIf handler
+
+ServiceHandler.cpp
+ class ServiceHandler : virtual ServiceIf
+
+TServer.cpp
+ TServer(TProcessor processor,
+         TServerTransport transport,
+         TTransportFactory tfactory,
+         TProtocolFactory pfactory)
+ serve()
+\end{verbatim}
+
+From the Thrift definition file, we generate the virtual service interface.
+A client class is generated, which implements the interface and
+uses two \texttt{TProtocol} instances to perform the I/O operations. The
+generated processor implements the \texttt{TProcessor} interface. The generated
+code has all the logic to handle RPC invocations via the \texttt{process()}
+call, and takes as a parameter an instance of the service interface, as
+implemented by the application developer.
+
+The user provides an implementation of the application interface in separate,
+non-generated source code.
+
+\subsection{TServer}
+
+Finally, the Thrift core libraries provide a \texttt{TServer} abstraction.
+The \texttt{TServer} object generally works as follows.
+
+\begin{itemize}
+\item Use the \texttt{TServerTransport} to get a \texttt{TTransport}
+\item Use the \texttt{TTransportFactory} to optionally convert the primitive
+transport into a suitable application transport (typically the
+\texttt{TBufferedTransportFactory} is used here)
+\item Use the \texttt{TProtocolFactory} to create an input and output protocol
+for the \texttt{TTransport}
+\item Invoke the \texttt{process()} method of the \texttt{TProcessor} object
+\end{itemize}
+
+The layers are appropriately separated such that the server code needs to know
+nothing about any of the transports, encodings, or applications in play. The
+server encapsulates the logic around connection handling, threading, etc.
+while the processor deals with RPC. The only code written by the application
+developer lives in the definitional Thrift file and the interface
+implementation.
+
+Facebook has deployed multiple \texttt{TServer} implementations, including
+the single-threaded \texttt{TSimpleServer}, thread-per-connection
+\texttt{TThreadedServer}, and thread-pooling \texttt{TThreadPoolServer}.
+
+The \texttt{TProcessor} interface is very general by design. There is no
+requirement that a \texttt{TServer} take a generated \texttt{TProcessor}
+object. Thrift allows the application developer to easily write any type of
+server that operates on \texttt{TProtocol} objects (for instance, a server
+could simply stream a certain type of object without any actual RPC method
+invocation).
+
+\section{Implementation Details}
+\subsection{Target Languages}
+Thrift currently supports five target languages: C++, Java, Python, Ruby, and
+PHP. At Facebook, we have deployed servers predominantly in C++, Java, and
+Python. Thrift services implemented in PHP have also been embedded into the
+Apache web server, providing transparent backend access to many of our
+frontend constructs using a \texttt{THttpClient} implementation of the
+\texttt{TTransport} interface.
+
+Though Thrift was explicitly designed to be much more efficient and robust
+than typical web technologies, as we were designing our XML-based REST web
+services API we noticed that Thrift could be easily used to define our
+service interface. Though we do not currently employ SOAP envelopes (in the
+authors' opinions there is already far too much repetitive enterprise Java
+software to do that sort of thing), we were able to quickly extend Thrift to
+generate XML Schema Definition files for our service, as well as a framework
+for versioning different implementations of our web service. Though public
+web services are admittedly tangential to Thrift's core use case and design,
+Thrift facilitated rapid iteration and affords us the ability to quickly
+migrate our entire XML-based web service onto a higher performance system
+should the need arise.
+
+\subsection{Generated Structs}
+We made a conscious decision to make our generated structs as transparent as
+possible. All fields are publicly accessible; there are no \texttt{set()} and
+\texttt{get()} methods. Similarly, use of the \texttt{isset} object is not
+enforced. We do not include any \texttt{FieldNotSetException} construct.
+Developers have the option to use these fields to write more robust code, but
+the system is robust to the developer ignoring the \texttt{isset} construct
+entirely and will provide suitable default behavior in all cases.
+
+This choice was motivated by the desire to ease application development. Our stated
+goal is not to make developers learn a rich new library in their language of
+choice, but rather to generate code that allow them to work with the constructs
+that are most familiar in each language.
+
+We also made the \texttt{read()} and \texttt{write()} methods of the generated
+objects public so that the objects can be used outside of the context
+of RPC clients and servers. Thrift is a useful tool simply for generating
+objects that are easily serializable across programming languages.
+
+\subsection{RPC Method Identification}
+Method calls in RPC are implemented by sending the method name as a string. One
+issue with this approach is that longer method names require more bandwidth.
+We experimented with using fixed-size hashes to identify methods, but in the
+end concluded that the savings were not worth the headaches incurred. Reliably
+dealing with conflicts across versions of an interface definition file is
+impossible without a meta-storage system (i.e. to generate non-conflicting
+hashes for the current version of a file, we would have to know about all
+conflicts that ever existed in any previous version of the file).
+
+We wanted to avoid too many unnecessary string comparisons upon
+method invocation. To deal with this, we generate maps from strings to function
+pointers, so that invocation is effectively accomplished via a constant-time
+hash lookup in the common case. This requires the use of a couple interesting
+code constructs. Because Java does not have function pointers, process
+functions are all private member classes implementing a common interface.
+
+\begin{verbatim}
+private class ping implements ProcessFunction {
+  public void process(int seqid,
+                      TProtocol iprot,
+                      TProtocol oprot)
+    throws TException
+  { ...}
+}
+
+HashMap<String,ProcessFunction> processMap_ =
+  new HashMap<String,ProcessFunction>();
+\end{verbatim}
+
+In C++, we use a relatively esoteric language construct: member function
+pointers.
+
+\begin{verbatim}
+std::map<std::string,
+  void (ExampleServiceProcessor::*)(int32_t,
+  facebook::thrift::protocol::TProtocol*,
+  facebook::thrift::protocol::TProtocol*)>
+ processMap_;
+\end{verbatim}
+
+Using these techniques, the cost of string processing is minimized, and we
+reap the benefit of being able to easily debug corrupt or misunderstood data by
+inspecting it for known string method names.
+
+\subsection{Servers and Multithreading}
+Thrift services require basic multithreading to handle simultaneous
+requests from multiple clients. For the Python and Java implementations of
+Thrift server logic, the standard threading libraries distributed with the
+languages provide adequate support. For the C++ implementation, no standard multithread runtime
+library exists. Specifically, robust, lightweight, and portable
+thread manager and timer class implementations do not exist. We investigated
+existing implementations, namely \texttt{boost::thread},
+\texttt{boost::threadpool}, \texttt{ACE\_Thread\_Manager} and
+\texttt{ACE\_Timer}.
+
+While \texttt{boost::threads}\cite{boost.threads}  provides clean,
+lightweight and robust implementations of multi-thread primitives (mutexes,
+conditions, threads) it does not provide a thread manager or timer
+implementation.
+
+\texttt{boost::threadpool}\cite{boost.threadpool} also looked promising but
+was not far enough along for our purposes. We wanted to limit the dependency on
+third-party libraries as much as possible. Because\\
+\texttt{boost::threadpool} is
+not a pure template library and requires runtime libraries and because it is
+not yet part of the official Boost distribution we felt it was not ready for
+use in Thrift. As \texttt{boost::threadpool} evolves and especially if it is
+added to the Boost distribution we may reconsider our decision to not use it.
+
+ACE has both a thread manager and timer class in addition to multi-thread
+primitives. The biggest problem with ACE is that it is ACE. Unlike Boost, ACE
+API quality is poor. Everything in ACE has large numbers of dependencies on
+everything else in ACE - thus forcing developers to throw out standard
+classes, such as STL collections, in favor of ACE's homebrewed implementations. In
+addition, unlike Boost, ACE implementations demonstrate little understanding
+of the power and pitfalls of C++ programming and take no advantage of modern
+templating techniques to ensure compile time safety and reasonable compiler
+error messages. For all these reasons, ACE was rejected. Instead, we chose
+to implement our own library, described in the following sections.
+
+\subsection{Thread Primitives}
+
+The Thrift thread libraries are implemented in the namespace\\
+\texttt{facebook::thrift::concurrency} and have three components:
+\begin{itemize}
+\item primitives
+\item thread pool manager
+\item timer manager
+\end{itemize}
+
+As mentioned above, we were hesitant to introduce any additional dependencies
+on Thrift. We decided to use \texttt{boost::shared\_ptr} because it is so
+useful for multithreaded application, it requires no link-time or
+runtime libraries (i.e. it is a pure template library) and it is due
+to become part of the C++0x standard.
+
+We implement standard \texttt{Mutex} and \texttt{Condition} classes, and a
+ \texttt{Monitor} class. The latter is simply a combination of a mutex and
+condition variable and is analogous to the \texttt{Monitor} implementation provided for
+the Java \texttt{Object} class. This is also sometimes referred to as a barrier. We
+provide a \texttt{Synchronized} guard class to allow Java-like synchronized blocks.
+This is just a bit of syntactic sugar, but, like its Java counterpart, clearly
+delimits critical sections of code. Unlike its Java counterpart, we still
+have the ability to programmatically lock, unlock, block, and signal monitors.
+
+\begin{verbatim}
+void run() {
+ {Synchronized s(manager->monitor);
+  if (manager->state == TimerManager::STARTING) {
+    manager->state = TimerManager::STARTED;
+    manager->monitor.notifyAll();
+  }
+ }
+}
+\end{verbatim}
+
+We again borrowed from Java the distinction between a thread and a runnable
+class. A \texttt{Thread} is the actual schedulable object. The
+\texttt{Runnable} is the logic to execute within the thread.
+The \texttt{Thread} implementation deals with all the platform-specific thread
+creation and destruction issues, while the \texttt{Runnable} implementation deals
+with the application-specific per-thread logic. The benefit of this approach
+is that developers can easily subclass the Runnable class without pulling in
+platform-specific super-classes.
+
+\subsection{Thread, Runnable, and shared\_ptr}
+We use \texttt{boost::shared\_ptr} throughout the \texttt{ThreadManager} and
+\texttt{TimerManager} implementations to guarantee cleanup of dead objects that can
+be accessed by multiple threads. For \texttt{Thread} class implementations,
+\texttt{boost::shared\_ptr} usage requires particular attention to make sure
+\texttt{Thread} objects are neither leaked nor dereferenced prematurely while
+creating and shutting down threads.
+
+Thread creation requires calling into a C library. (In our case the POSIX
+thread library, \texttt{libpthread}, but the same would be true for WIN32 threads).
+Typically, the OS makes few, if any, guarantees about when \texttt{ThreadMain}, a C thread's entry-point function, will be called. Therefore, it is
+possible that our thread create call,
+\texttt{ThreadFactory::newThread()} could return to the caller
+well before that time. To ensure that the returned \texttt{Thread} object is not
+prematurely cleaned up if the caller gives up its reference prior to the
+\texttt{ThreadMain} call, the \texttt{Thread} object makes a weak reference to
+itself in its \texttt{start} method.
+
+With the weak reference in hand the \texttt{ThreadMain} function can attempt to get
+a strong reference before entering the \texttt{Runnable::run} method of the
+\texttt{Runnable} object bound to the \texttt{Thread}. If no strong references to the
+thread are obtained between exiting \texttt{Thread::start} and entering \texttt{ThreadMain}, the weak reference returns \texttt{null} and the function
+exits immediately.
+
+The need for the \texttt{Thread} to make a weak reference to itself has a
+significant impact on the API. Since references are managed through the
+\texttt{boost::shared\_ptr} templates, the \texttt{Thread} object must have a reference
+to itself wrapped by the same \texttt{boost::shared\_ptr} envelope that is returned
+to the caller. This necessitated the use of the factory pattern.
+\texttt{ThreadFactory} creates the raw \texttt{Thread} object and a
+\texttt{boost::shared\_ptr} wrapper, and calls a private helper method of the class
+implementing the \texttt{Thread} interface (in this case, \texttt{PosixThread::weakRef})
+ to allow it to make add weak reference to itself through the
+ \texttt{boost::shared\_ptr} envelope.
+
+\texttt{Thread} and \texttt{Runnable} objects reference each other. A \texttt{Runnable}
+object may need to know about the thread in which it is executing, and a Thread, obviously,
+needs to know what \texttt{Runnable} object it is hosting. This interdependency is
+further complicated because the lifecycle of each object is independent of the
+other. An application may create a set of \texttt{Runnable} object to be reused in different threads, or it may create and forget a \texttt{Runnable} object
+once a thread has been created and started for it.
+
+The \texttt{Thread} class takes a \texttt{boost::shared\_ptr} reference to the hosted
+\texttt{Runnable} object in its constructor, while the \texttt{Runnable} class has an
+explicit \texttt{thread} method to allow explicit binding of the hosted thread.
+\texttt{ThreadFactory::newThread} binds the objects to each other.
+
+\subsection{ThreadManager}
+
+\texttt{ThreadManager} creates a pool of worker threads and
+allows applications to schedule tasks for execution as free worker threads
+become available. The \texttt{ThreadManager} does not implement dynamic
+thread pool resizing, but provides primitives so that applications can add
+and remove threads based on load. This approach was chosen because
+implementing load metrics and thread pool size is very application
+specific. For example some applications may want to adjust pool size based
+on running-average of work arrival rates that are measured via polled
+samples. Others may simply wish to react immediately to work-queue
+depth high and low water marks. Rather than trying to create a complex
+API abstract enough to capture these different approaches, we
+simply leave it up to the particular application and provide the
+primitives to enact the desired policy and sample current status.
+
+\subsection{TimerManager}
+
+\texttt{TimerManager} allows applications to schedule
+ \texttt{Runnable} objects for execution at some point in the future. Its specific task
+is to allows applications to sample \texttt{ThreadManager} load at regular
+intervals and make changes to the thread pool size based on application policy.
+Of course, it can be used to generate any number of timer or alarm events.
+
+The default implementation of \texttt{TimerManager} uses a single thread to
+execute expired \texttt{Runnable} objects. Thus, if a timer operation needs to
+do a large amount of work and especially if it needs to do blocking I/O,
+that should be done in a separate thread.
+
+\subsection{Nonblocking Operation}
+Though the Thrift transport interfaces map more directly to a blocking I/O
+model, we have implemented a high performance \texttt{TNonBlockingServer}
+in C++ based on \texttt{libevent} and the \texttt{TFramedTransport}. We
+implemented this by moving all I/O into one tight event loop using a
+state machine. Essentially, the event loop reads framed requests into
+\texttt{TMemoryBuffer} objects. Once entire requests are ready, they are
+dispatched to the \texttt{TProcessor} object which can read directly from
+the data in memory.
+
+\subsection{Compiler}
+The Thrift compiler is implemented in C++ using standard \texttt{lex}/\texttt{yacc}
+lexing and parsing. Though it could have been implemented with fewer
+lines of code in another language (i.e. Python Lex-Yacc (PLY) or \texttt{ocamlyacc}), using C++
+forces explicit definition of the language constructs. Strongly typing the
+parse tree elements (debatably) makes the code more approachable for new
+developers.
+
+Code generation is done using two passes. The first pass looks only for
+include files and type definitions. Type definitions are not checked during
+this phase, since they may depend upon include files. All included files
+are sequentially scanned in a first pass. Once the include tree has been
+resolved, a second pass over all files is taken that inserts type definitions
+into the parse tree and raises an error on any undefined types. The program is
+then generated against the parse tree.
+
+Due to inherent complexities and potential for circular dependencies,
+we explicitly disallow forward declaration. Two Thrift structs cannot
+each contain an instance of the other. (Since we do not allow \texttt{null}
+struct instances in the generated C++ code, this would actually be impossible.)
+
+\subsection{TFileTransport}
+The \texttt{TFileTransport} logs Thrift requests/structs by
+framing incoming data with its length and writing it out to disk.
+Using a framed on-disk format allows for better error checking and
+helps with the processing of a finite number of discrete events. The\\
+\texttt{TFileWriterTransport} uses a system of swapping in-memory buffers
+to ensure good performance while logging large amounts of data.
+A Thrift log file is split up into chunks of a specified size; logged messages
+are not allowed to cross chunk boundaries. A message that would cross a chunk
+boundary will cause padding to be added until the end of the chunk and the
+first byte of the message are aligned to the beginning of the next chunk.
+Partitioning the file into chunks makes it possible to read and interpret data
+from a particular point in the file.
+
+\section{Facebook Thrift Services}
+Thrift has been employed in a large number of applications at Facebook, including
+search, logging, mobile, ads and the developer platform. Two specific usages are discussed below.
+
+\subsection{Search}
+Thrift is used as the underlying protocol and transport layer for the Facebook Search service.
+The multi-language code generation is well suited for search because it allows for application
+development in an efficient server side language (C++) and allows the Facebook PHP-based web application
+to make calls to the search service using Thrift PHP libraries. There is also a large
+variety of search stats, deployment and testing functionality that is built on top
+of generated Python code. Additionally, the Thrift log file format is
+used as a redo log for providing real-time search index updates. Thrift has allowed the
+search team to leverage each language for its strengths and to develop code at a rapid pace.
+
+\subsection{Logging}
+The Thrift \texttt{TFileTransport} functionality is used for structured logging. Each
+service function definition along with its parameters can be considered to be
+a structured log entry identified by the function name. This log can then be used for
+a variety of purposes, including inline and offline processing, stats aggregation and as a redo log.
+
+\section{Conclusions}
+Thrift has enabled Facebook to build scalable backend
+services efficiently by enabling engineers to divide and conquer. Application
+developers can focus on application code without worrying about the
+sockets layer. We avoid duplicated work by writing buffering and I/O logic
+in one place, rather than interspersing it in each application.
+
+Thrift has been employed in a wide variety of applications at Facebook,
+including search, logging, mobile, ads, and the developer platform. We have
+found that the marginal performance cost incurred by an extra layer of
+software abstraction is far eclipsed by the gains in developer efficiency and
+systems reliability.
+
+\appendix
+
+\section{Similar Systems}
+The following are software systems similar to Thrift. Each is (very!) briefly
+described:
+
+\begin{itemize}
+\item \textit{SOAP.} XML-based. Designed for web services via HTTP, excessive
+XML parsing overhead.
+\item \textit{CORBA.} Relatively comprehensive, debatably overdesigned and
+heavyweight. Comparably cumbersome software installation.
+\item \textit{COM.} Embraced mainly in Windows client software. Not an entirely
+open solution.
+\item \textit{Pillar.} Lightweight and high-performance, but missing versioning
+and abstraction.
+\item \textit{Protocol Buffers.} Closed-source, owned by Google. Described in
+Sawzall paper.
+\end{itemize}
+
+\acks
+
+Many thanks for feedback on Thrift (and extreme trial by fire) are due to
+Martin Smith, Karl Voskuil and Yishan Wong.
+
+Thrift is a successor to Pillar, a similar system developed
+by Adam D'Angelo, first while at Caltech and continued later at Facebook.
+Thrift simply would not have happened without Adam's insights.
+
+\begin{thebibliography}{}
+
+\bibitem{boost.threads}
+Kempf, William,
+``Boost.Threads'',
+\url{http://www.boost.org/doc/html/threads.html}
+
+\bibitem{boost.threadpool}
+Henkel, Philipp,
+``threadpool'',
+\url{http://threadpool.sourceforge.net}
+
+\end{thebibliography}
+
+\end{document}

http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/thrift-sasl-spec.txt
----------------------------------------------------------------------
diff --git a/doc/thrift-sasl-spec.txt b/doc/thrift-sasl-spec.txt
deleted file mode 100644
index 02cf79e..0000000
--- a/doc/thrift-sasl-spec.txt
+++ /dev/null
@@ -1,108 +0,0 @@
-A Thrift SASL message shall be a byte array of the following form:
-
-| 1-byte status code | 4-byte payload length | variable-length payload |
-
-The length fields shall be interpreted as integers, with the high byte sent
-first. This indicates the length of the field immediately following it, not
-including the status code or the length bytes.
-
-The possible status codes are:
-
-0x01 - START - Hello, let's go on a date.
-0x02 - OK - Everything's been going alright so far, let's see each other again.
-0x03 - BAD - I understand what you're saying. I really do. I just don't like it. We have to break up.
-0x04 - ERROR - We can't go on like this. It's like you're speaking another language.
-0x05 - COMPLETE - Will you marry me?
-
-The Thrift SASL communication will proceed as follows:
-
-1. The client is configured at instantiation of the transport with a single
-underlying SASL security mechanism that it supports.
-
-2. The server is configured with a mapping of underlying security mechanism
-name -> mechanism options.
-
-3. At connection time, the client will initiate communication by sending the
-server a START message. The payload of this message will be the name of the
-underlying security mechanism that the client would like to use.
-This mechanism name shall be 1-20 characters in length, and follow the
-specifications for SASL mechanism names specified in RFC 2222.
-
-4. The server receives this message and, if the mechanism name provided is
-among the set of mechanisms this server transport is configured to accept,
-appropriate initialization of the underlying security mechanism may take place.
-If the mechanism name is not one which the server is configured to support, the
-server shall return the BAD byte, followed by a 4-byte, potentially zero-value
-message length, followed by the potentially zero-length payload which may be a
-status code or message indicating failure. No further communication may take
-place via this transport. If the mechanism name is one which the server
-supports, then proceed to step 5.
-
-5. Following the START message, the client must send another message containing
-the "initial response" of the chosen SASL implementation. The client may send
-this message piggy-backed on the "START" message of step 3. The message type
-of this message must be either "OK" or "COMPLETE", depending on whether the
-SASL implementation indicates that this side of the authentication has been
-satisfied.
-
-6. The server then provides the byte array of the payload received to its
-underlying security mechanism. A challenge is generated by the underlying
-security mechanism on the server, and this is used as the payload for a message
-sent to the client. This message shall consist of an OK byte, followed by the
-non-zero message length word, followed by the payload.
-
-7. The client receives this message from the server and passes the payload to
-its underlying security mechanism to generate a response. The client then sends
-the server an OK byte, followed by the non-zero-value length of the response,
-followed by the bytes of the response as the payload.
-
-8. Steps 6 and 7 are repeated until both security mechanisms are satisfied with
-the challenge/response exchange. When either side has completed its security
-protocol, its next message shall be the COMPLETE byte, followed by a 4-byte
-potentially zero-value length word, followed by a potentially zero-length
-payload. This payload will be empty except for those underlying security
-mechanisms which provide additional data with success.
-
-If at any point in time either side is able to interpret the challenge or
-response sent by the other, but is dissatisfied with the contents thereof, this
-side should send the other a BAD byte, followed by a 4-byte potentially
-zero-value length word, followed by an optional, potentially zero-length
-message encoded in UTF-8 indicating failure. This message should be passed to
-the protocol above the thrift transport by whatever mechanism is appropriate
-and idiomatic for the particular language these thrift bindings are for.
-
-If at any point in time either side fails to interpret the challenge or
-response sent by the other, this side should send the other an ERROR byte,
-followed by a 4-byte potentially zero-value length word, followed by an
-optional, potentially zero-length message encoded in UTF-8. This message should
-be passed to the protocol above the thrift transport by whatever mechanism is
-appropriate and idiomatic for the particular language these thrift bindings are
-for.
-
-If step 8 completes successfully, then the communication is considered
-authenticated and subsequent communication may commence.
-
-If step 8 fails to complete successfully, then no further communication may
-take place via this transport.
-
-8. All writes to the underlying transport must be prefixed by the 4-byte length
-of the payload data, followed by the payload. All reads from this transport
-should read the 4-byte length word, then read the full quantity of bytes
-specified by this length word.
-
-If no SASL QOP (quality of protection) is negotiated during steps 6 and 7, then
-all subsequent writes to/reads from this transport are written/read unaltered,
-save for the length prefix, to the underlying transport.
-
-If a SASL QOP is negotiated, then this must be used by the Thrift transport for
-all subsequent communication. This is done by wrapping subsequent writes to the
-transport using the underlying security mechanism, and unwrapping subsequent
-reads from the underlying transport. Note that in this case, the length prefix
-of the write to the underlying transport is the length of the data after it has
-been wrapped by the underlying security mechanism. Note that the complete
-message must be read before giving this data to the underlying security
-mechanism for unwrapping.
-
-If at any point in time reading of a message fails either because of a
-malformed length word or failure to unwrap by the underlying security
-mechanism, then all further communication on this transport must cease.

http://git-wip-us.apache.org/repos/asf/thrift/blob/347a5ebb/doc/thrift.bnf
----------------------------------------------------------------------
diff --git a/doc/thrift.bnf b/doc/thrift.bnf
deleted file mode 100644
index 24d83f6..0000000
--- a/doc/thrift.bnf
+++ /dev/null
@@ -1,96 +0,0 @@
-Thrift Protocol Structure
-
-Last Modified: 2007-Jun-29
-
---------------------------------------------------------------------
-
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
-
---------------------------------------------------------------------
-
-This document describes the structure of the Thrift protocol
-without specifying the encoding. Thus, the order of elements
-could in some cases be rearranged depending upon the TProtocol
-implementation, but this document specifies the minimum required
-structure. There are some "dumb" terminals like STRING and INT
-that take the place of an actual encoding specification.
-
-They key point to notice is that ALL messages are just one wrapped
-<struct>. Depending upon the message type, the <struct> can be
-interpreted as the argument list to a function, the return value
-of a function, or an exception.
-
---------------------------------------------------------------------
-
-       <message> ::= <message-begin> <struct> <message-end>
-
- <message-begin> ::= <method-name> <message-type> <message-seqid>
-
-   <method-name> ::= STRING
-
-  <message-type> ::= T_CALL | T_REPLY | T_EXCEPTION
-
- <message-seqid> ::= I32
-
-        <struct> ::= <struct-begin> <field>* <field-stop> <struct-end>
-
-  <struct-begin> ::= <struct-name>
-
-   <struct-name> ::= STRING
-
-    <field-stop> ::= T_STOP
-
-         <field> ::= <field-begin> <field-data> <field-end>
-
-   <field-begin> ::= <field-name> <field-type> <field-id>
-
-    <field-name> ::= STRING
-
-    <field-type> ::= T_BOOL | T_BYTE | T_I8 | T_I16 | T_I32 | T_I64 | T_DOUBLE
-                     | T_STRING | T_BINARY | T_STRUCT | T_MAP | T_SET | T_LIST
-
-      <field-id> ::= I16
-
-    <field-data> ::= I8 | I16 | I32 | I64 | DOUBLE | STRING | BINARY
-                     <struct> | <map> | <list> | <set>
-
-           <map> ::= <map-begin> <field-datum>* <map-end>
-
-     <map-begin> ::= <map-key-type> <map-value-type> <map-size>
-
-  <map-key-type> ::= <field-type>
-
-<map-value-type> ::= <field-type>
-
-      <map-size> ::= I32
-
-          <list> ::= <list-begin> <field-data>* <list-end>
-
-    <list-begin> ::= <list-elem-type> <list-size>
-
-<list-elem-type> ::= <field-type>
-
-     <list-size> ::= I32
-
-           <set> ::= <set-begin> <field-data>* <set-end>
-
-     <set-begin> ::= <set-elem-type> <set-size>
-
- <set-elem-type> ::= <field-type>
-
-      <set-size> ::= I32