You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Bruce Simpson <bm...@incunabulum.net> on 2009/11/24 01:24:25 UTC

[PATCH] Add TProtocol::writeBinary(const uint8_t*, size_t)

Hi all,

To integrate Thrift into the XORP router project, the approach I've 
taken is to modify XORP's existing IDL generator to produce code which 
uses the Thrift protocols and transports directly.

This is mainly because the code base there already does asynchronous I/O 
using its own mechanisms, and using Thrift's own generator doesn't 
support this use. I explored both of the ASIO drops (from Rush Manbert 
and Spotify), and it was pretty clear that moving to ASIO in our tree 
would require significantly more refactoring.

The modified XIF parser generates RPC stubs, which call TProtocol 
methods, and present exactly the same method signatures as the existing 
code, to limit code modification of XORP to only those parts of the tree 
needed to implement Thrift as the RPC layer.

As XIF is mostly a subset of Thrift, I wrote a translator which parses 
XIF and produces the equivalent .thrift file, which should allow Thrift 
clients to talk to these services directly (providing they use 
TBinaryProtocol, TFramedTransport, TCP sockets, and know the endpoint 
address for each service).

Because of the nature of the service (IP routing and link management), 
we get into situations where we need to pass binary blobs back and forth 
as part of the RPC calls we're servicing. The XIF IDL already deals with 
this situation, however, in XIF the base type used is vector<uint8_t>.

My stub translator compiles just fine, however, I can end up with 
constructs like this:
[given 's' as a vector<uint8_t> argument]
    nout += outp->writeBinary(string(reinterpret_cast<const 
char*>(&s[0]), s.size()));

Now this is pretty nasty, because the C++ standard libary has no 'const 
string&' specialization; indeed, 'const string&' just specifies the 
storage class -- as far as C++ is concerned, it isn't a distinct type, 
and the code construct above will go off to allocate an intermediate 
buffer from the C++ runtime heap. Probably not what we want on a 
critical path.

I looked around to see if there were a quick fix in terms of a container 
adaptor, no dice. This seems to be a more general problem -- when 
working with string literals in C++, 'const char*' and a 'size_t' seems 
to be about as good as it gets.

It's likely I'd also need to take a similar approach for the read path, 
my stub generator currently has to bounce buffer.

The TProtocol::readBinary() method probably needs a little bit more 
thinking about, and I'll come back to that -- I notice 
TBinaryProtocol::readBinary() is taking pains not to abuse 'auto' 
(stack) class storage.

Example:
[given 's' as a vector<uint8_t> argument]
    string tmp_s;
    nin += inp->readBinary(tmp_s);
    s.resize(tmp_s.size());
    memcpy(&s[0], tmp_s.data(), tmp_s.size());

Here is a patch (against SVN trunk) for the C++ client library to 
support C-style buffers, as well as 'const std::string&'. by overloading 
the writeBinary() method in TProtocol and its derived classes.

This seems a more reasonable approach than limiting it to the use of 
std::vector<T>; type safety is side-stepped for a very specific reason. 
A 'gmake' runs to completion with this patch on FreeBSD 7.2-STABLE/amd64.

Comments, questions, suggestions highly appreciated.

cheers,
BMS

Re: Avoiding copies with TProtocol in C++

Posted by Bruce Simpson <bm...@incunabulum.net>.
Hi all,

Attached is a patch which implements approach #1 to eliding intermediate 
copies from a method overload, TProtocol::readBinary(vector<uint8_t>&).

Bruce Simpson wrote:
> So it seems, to deal with my need for vector<uint8_t> in my existing 
> application's RPC methods, without introducing unnecessary allocations 
> or copies on the receive path. this boils down to two design choices:
> 1.  Just go ahead and implement it in libthrift itself.
> 2. Add a helper method in my application's IPC library, specific to 
> TBinaryProtocol, to read the blob from the TTransport directly.
>
> Approach #1:
>    Thrift is already using <vector> anyway, but not for T = uint8_t.
>    The additional cost of vector<uint8_t> usage in the library is zero 
> for static linkage, if the linker knows to discard the template 
> expansions from the text segment [6]. However, shared linkage is going 
> introduce weak symbols for vector<uint8_t> methods due to template 
> expansion, and thus add a small amount of runtime text cost to 
> libthrift.so.
>    If a limit is placed on the payload size upfront, things are a 
> little easier, if say we wanted to implement 
> TProtocol::readBinary(uint8_t* data, size_t& len), without requiring 
> vector<uint8_t>.
>    It is a more generic interface, but still the problem looms of how 
> to resize the buffer if input is bigger than we expect, assuming the 
> caller manages the buffer. Probably not a good idea to be using 
> C-style buffer management here.
>    Even so, there is no easy way to convert a uint8_t array to a 
> vector without a copy [5].
>
> So it might be best just to add TProtocol::readBinary(vector<uint8_t>& 
> buf) to the library.

I measure the .so additional code at just over 4KB with Thrift's default 
optimization settings (-O2). The static library is slightly better. Most 
of the additional code is in the ELF .text segment; the rest of the 
overhead is probably in the dynsyms, and for the new method signatures.

The template expansions are pretty minimal, although I didn't profile 
with -frepo; this appears to be broke in FreeBSD's base system compiler. 
-fno-implicit-templates has per-translation-unit granularity, so it's 
difficult to shuffle the vector<uint8_t> instantiation elsewhere, 
without additional hacking.

Given that the Thrift code generator itself will marshal a byte[] array 
quite differently vs the binary type, this is pretty specific to XORP's 
use of Thrift, and it could be argued it doesn't belong in the library.
If so, is there a more elegant way to get data in and out of the 
TProtocol interface, without additional copies, and still sharing the 
implementation in libthrift.so as far as possible?

On the other hand, the Thrift generator will see the 'binary' type and 
produce calls to the current TProtocol::readString() implementations, 
and those are going to use intermediate copies, too. So it could also be 
argued that this has general performance benefits.

Comments? Suggestions?

thanks,
BMS


Avoiding copies with TProtocol in C++

Posted by Bruce Simpson <bm...@incunabulum.net>.
[Alternative Subject line: TProtocol::readBinary(vector<uint8_t>& buf) ?]

I realize this is a bit of a long one, but I really wanted to make sure 
I had covered all bases before writing code.

To recap:
    I'm locked into using vector<uint8_t> for binary blobs in my 
application, because a const vector<uint8_t>& is what the code above and 
below the RPC is expecting.
    But with good reasons: These are typically on a critical path, so we 
want low latency; they involve packet payloads, and thus, we need them 
to be contiguously addressable. They usually come straight to or from 
socket syscalls, or even libpcap.
    So, we don't use std::string for representing them. [-1]
    Thrift's current implementations of TProtocol::readBinary() are 
likely to introduce copies in this case, because they use std::string.

Bruce Simpson wrote:
> It's likely I'd also need to take a similar approach for the read 
> path, my stub generator currently has to bounce buffer.
>
> The TProtocol::readBinary() method probably needs a little bit more 
> thinking about, and I'll come back to that -- I notice 
> TBinaryProtocol::readBinary() is taking pains not to abuse 'auto' 
> (stack) class storage.

    I want to avoid intermediate copies if at all possible, without 
breaking Thrift's design assumption that TTransport has serial (ie not 
random access) semantics. This would be desirable for strings, also, 
especially if they get big.

    In my situation, the caller generally wants to manage, or at least 
instantiate, the buffer which the TProtocol instance reads into. This is 
in order to avoid the cost of those intermediate copies, which can 
quickly mount up when packets are being sent rapidly.

 Here's that example again from XORP's Thrift code generator:
>
> Example:
> [given 's' as a vector<uint8_t> argument]
>    string tmp_s;
>    nin += inp->readBinary(tmp_s);
>    s.resize(tmp_s.size());
>    memcpy(&s[0], tmp_s.data(), tmp_s.size());

In the above example, including the transport, there are going to be up 
to *5* copies of the same data for an in-flight RPC, not including 
kernel socket buffers.

Let's count them:
    1 for a TMemoryBuffer managed by my own TTransport class,
    1 for its socket read buffer,
    1 for the private buffer used by TBinaryProtocol itself,
    1 for the std::string temporary which TBinaryProtocol::readBinary() 
assigns to [0], and finally,
    1 in the vector<uint8_t> I want to read the blob into.

Reasons being:
    There is no getting away from the TMemoryBuffer, as my operations 
are async in the client (and potentially need one in the server too if 
we thread-pool), although I could consume it once the message is sent.
    I can't get away from using a socket read buffer, because of the 
need to re-acquire message boundaries from the stream, when TCP is used. 
SCTP wouldn't have this problem, but that's academic right now. This 
cost is part of TNonblockingServer anyway, although I am using my own 
TTransport-derived class.
    The std::string temporary in TBinaryProtocol::readBinary() is 
probably going to get elided by the fact it's an lvalue, but perhaps we 
should do what TCompactProtocol does and be explicit about assignment, 
to benefit from any library optimizations. [0]

In all cases of TProtocol derived classes, the size of the blob being 
marshaled is known upfront, so a reallocation often isn't needed if a 
buffer is already to hand. This optimization already exists in 
TBinaryProtocol for strings.

For TJSONProtocol, there is no avoiding intermediate buffering of 
readBinary(), as base64 decode needs to happen. We could 'peel off' the 
str instance by reading into it, and overwriting each byte as we decode 
it, which would eliminate another copy; as that method is implemented in 
the SVN trunk, it's easier to append to str.

More background:
    Part of the problem is that std::string has no notion of shared 
storage, at the level of its API. [1]. 
TBinaryProtocol::readBinary(std::string& str) tries to avoid allocating 
stack temporaries. It uses a buffer, managed with the C heap allocator, 
private to the class instance. [2]
    The std::string constructor called to assign to the 'str' argument 
will cause the default allocator to be called, to manage its own copy of 
this buffer; another intermediate copy, as std::string has no knowledge 
of this buffer's allocator semantics. [3] [4]

Without hacking allocators, or adding another method to the library: it 
seems hard, to my mind, to come up with a compiler and library agnostic 
solution to my problem, that of coaxing Thrift to read into a 
vector<uint8_t> without further copying.

Reasons being:
    * Whilst vector<T> has similar methods to std::string for free store 
management, exploiting the method similarity would require a template 
specialization. No way, jose -- that's largely academic, we are not 
doing template methods in TProtocol.
    * I looked around to see if there were any constructor tricks I 
could use to trick vector<T> into using pre-allocated storage, or easily 
casting to it without a copy. No dice. [5]
    * I'd just end up reinventing shared_ptr<T>; vector<T> has no 
knowledge of how to share and refcount storage, unless you explicitly 
tell it to. My application doesn't know about shared_ptr<T> yet, and 
using this would mean changing our app's existing code, something I'm 
keen to avoid given its size.


[on uint8_t/size_t vs vector<T>]
> This seems a more reasonable approach than limiting it to the use of 
> std::vector<T>; type safety is side-stepped for a very specific 
> reason. A 'gmake' runs to completion with this patch on FreeBSD 
> 7.2-STABLE/amd64.

So it seems, to deal with my need for vector<uint8_t> in my existing 
application's RPC methods, without introducing unnecessary allocations 
or copies on the receive path. this boils down to two design choices:
 1.  Just go ahead and implement it in libthrift itself.
 2. Add a helper method in my application's IPC library, specific to 
TBinaryProtocol, to read the blob from the TTransport directly.

Approach #1:
    Thrift is already using <vector> anyway, but not for T = uint8_t.
    The additional cost of vector<uint8_t> usage in the library is zero 
for static linkage, if the linker knows to discard the template 
expansions from the text segment [6]. However, shared linkage is going 
introduce weak symbols for vector<uint8_t> methods due to template 
expansion, and thus add a small amount of runtime text cost to libthrift.so.
    If a limit is placed on the payload size upfront, things are a 
little easier, if say we wanted to implement 
TProtocol::readBinary(uint8_t* data, size_t& len), without requiring 
vector<uint8_t>.
    It is a more generic interface, but still the problem looms of how 
to resize the buffer if input is bigger than we expect, assuming the 
caller manages the buffer. Probably not a good idea to be using C-style 
buffer management here.
    Even so, there is no easy way to convert a uint8_t array to a vector 
without a copy [5].

So it might be best just to add TProtocol::readBinary(vector<uint8_t>& 
buf) to the library.

Approach #2:
    On the other hand, if we hard-wire everything in our app to 
TBinaryProtocol, and explicitly add a helper method, the problem of 
doing it generically in the Thrift C++ client library, libthrift.so, 
goes away.
    I already have a few for encapsulating XORP XRL transport exceptions 
as TApplicationExceptions. (I did it this way mainly to avoid spamming 
the translated IDL with exception {} decls.)
    It would be useful at some point later on to have access to the 
other protocols, but TBinaryProtocol probably works best for us.

This means less flexibility for our app, but doesn't introduce 
vector<uint8_t> to libthrift.so.
I would prefer to solve the problem generally, and push a patch 
upstream, so others can benefit.

At times like this, I wish C++ had Python's buffer management.
On the other hand, this seems like a more general problem: i.e. that of 
avoiding intermediate copies on the critical path, whilst preserving 
TTransport's serial access semantics.

I'm sure others may also need to get data in and out of Thrift like 
this, and an option to stereotype a byte[] array as having these 
semantics in the generated C++ code may be useful.

Comments? Suggestions?

regards,
BMS

[-1] Before Thrift, there is probably intermediate copying going on. 
Long story.
[0] TCompactProtocol::readBinary() is more explicit, it uses 
std::string::assign() on the instance passed to it.
[1] Although implementations can, and often do, implement tricks inside 
the C++ standard library, to share storage between instances of 
std::string which are associated with each other by assignment or 
construction; GNU libstdc++ refcounts the backing store for its 
implementation.
[2] Folk paying heed to thread safety will note that 
TBinaryProtocol::readBinary() has no re-entrancy guarantee, because of 
its private buffer; regardless of whether the C/C++ heap allocator is 
re-entrant or not.
[3] Once again: there is no const std::string specialization, so no 
opportunity for the compiler to elide the allocation, and just emit a 
reference to const storage.
[4] In this case, because not all of the objects involved are instances 
of std::string, we can't benefit from optimization in [1], and even if 
they were, this optimization is implementation-defined.
[5] Same as for [3]; a const vector<T> gets no special treatment.
[6] LLVM will potentially be very smart at this later on.