You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Bruce Simpson <bm...@incunabulum.net> on 2009/11/24 01:24:25 UTC
[PATCH] Add TProtocol::writeBinary(const uint8_t*, size_t)
Hi all,
To integrate Thrift into the XORP router project, the approach I've
taken is to modify XORP's existing IDL generator to produce code which
uses the Thrift protocols and transports directly.
This is mainly because the code base there already does asynchronous I/O
using its own mechanisms, and using Thrift's own generator doesn't
support this use. I explored both of the ASIO drops (from Rush Manbert
and Spotify), and it was pretty clear that moving to ASIO in our tree
would require significantly more refactoring.
The modified XIF parser generates RPC stubs, which call TProtocol
methods, and present exactly the same method signatures as the existing
code, to limit code modification of XORP to only those parts of the tree
needed to implement Thrift as the RPC layer.
As XIF is mostly a subset of Thrift, I wrote a translator which parses
XIF and produces the equivalent .thrift file, which should allow Thrift
clients to talk to these services directly (providing they use
TBinaryProtocol, TFramedTransport, TCP sockets, and know the endpoint
address for each service).
Because of the nature of the service (IP routing and link management),
we get into situations where we need to pass binary blobs back and forth
as part of the RPC calls we're servicing. The XIF IDL already deals with
this situation, however, in XIF the base type used is vector<uint8_t>.
My stub translator compiles just fine, however, I can end up with
constructs like this:
[given 's' as a vector<uint8_t> argument]
nout += outp->writeBinary(string(reinterpret_cast<const
char*>(&s[0]), s.size()));
Now this is pretty nasty, because the C++ standard libary has no 'const
string&' specialization; indeed, 'const string&' just specifies the
storage class -- as far as C++ is concerned, it isn't a distinct type,
and the code construct above will go off to allocate an intermediate
buffer from the C++ runtime heap. Probably not what we want on a
critical path.
I looked around to see if there were a quick fix in terms of a container
adaptor, no dice. This seems to be a more general problem -- when
working with string literals in C++, 'const char*' and a 'size_t' seems
to be about as good as it gets.
It's likely I'd also need to take a similar approach for the read path,
my stub generator currently has to bounce buffer.
The TProtocol::readBinary() method probably needs a little bit more
thinking about, and I'll come back to that -- I notice
TBinaryProtocol::readBinary() is taking pains not to abuse 'auto'
(stack) class storage.
Example:
[given 's' as a vector<uint8_t> argument]
string tmp_s;
nin += inp->readBinary(tmp_s);
s.resize(tmp_s.size());
memcpy(&s[0], tmp_s.data(), tmp_s.size());
Here is a patch (against SVN trunk) for the C++ client library to
support C-style buffers, as well as 'const std::string&'. by overloading
the writeBinary() method in TProtocol and its derived classes.
This seems a more reasonable approach than limiting it to the use of
std::vector<T>; type safety is side-stepped for a very specific reason.
A 'gmake' runs to completion with this patch on FreeBSD 7.2-STABLE/amd64.
Comments, questions, suggestions highly appreciated.
cheers,
BMS
Re: Avoiding copies with TProtocol in C++
Posted by Bruce Simpson <bm...@incunabulum.net>.
Hi all,
Attached is a patch which implements approach #1 to eliding intermediate
copies from a method overload, TProtocol::readBinary(vector<uint8_t>&).
Bruce Simpson wrote:
> So it seems, to deal with my need for vector<uint8_t> in my existing
> application's RPC methods, without introducing unnecessary allocations
> or copies on the receive path. this boils down to two design choices:
> 1. Just go ahead and implement it in libthrift itself.
> 2. Add a helper method in my application's IPC library, specific to
> TBinaryProtocol, to read the blob from the TTransport directly.
>
> Approach #1:
> Thrift is already using <vector> anyway, but not for T = uint8_t.
> The additional cost of vector<uint8_t> usage in the library is zero
> for static linkage, if the linker knows to discard the template
> expansions from the text segment [6]. However, shared linkage is going
> introduce weak symbols for vector<uint8_t> methods due to template
> expansion, and thus add a small amount of runtime text cost to
> libthrift.so.
> If a limit is placed on the payload size upfront, things are a
> little easier, if say we wanted to implement
> TProtocol::readBinary(uint8_t* data, size_t& len), without requiring
> vector<uint8_t>.
> It is a more generic interface, but still the problem looms of how
> to resize the buffer if input is bigger than we expect, assuming the
> caller manages the buffer. Probably not a good idea to be using
> C-style buffer management here.
> Even so, there is no easy way to convert a uint8_t array to a
> vector without a copy [5].
>
> So it might be best just to add TProtocol::readBinary(vector<uint8_t>&
> buf) to the library.
I measure the .so additional code at just over 4KB with Thrift's default
optimization settings (-O2). The static library is slightly better. Most
of the additional code is in the ELF .text segment; the rest of the
overhead is probably in the dynsyms, and for the new method signatures.
The template expansions are pretty minimal, although I didn't profile
with -frepo; this appears to be broke in FreeBSD's base system compiler.
-fno-implicit-templates has per-translation-unit granularity, so it's
difficult to shuffle the vector<uint8_t> instantiation elsewhere,
without additional hacking.
Given that the Thrift code generator itself will marshal a byte[] array
quite differently vs the binary type, this is pretty specific to XORP's
use of Thrift, and it could be argued it doesn't belong in the library.
If so, is there a more elegant way to get data in and out of the
TProtocol interface, without additional copies, and still sharing the
implementation in libthrift.so as far as possible?
On the other hand, the Thrift generator will see the 'binary' type and
produce calls to the current TProtocol::readString() implementations,
and those are going to use intermediate copies, too. So it could also be
argued that this has general performance benefits.
Comments? Suggestions?
thanks,
BMS
Avoiding copies with TProtocol in C++
Posted by Bruce Simpson <bm...@incunabulum.net>.
[Alternative Subject line: TProtocol::readBinary(vector<uint8_t>& buf) ?]
I realize this is a bit of a long one, but I really wanted to make sure
I had covered all bases before writing code.
To recap:
I'm locked into using vector<uint8_t> for binary blobs in my
application, because a const vector<uint8_t>& is what the code above and
below the RPC is expecting.
But with good reasons: These are typically on a critical path, so we
want low latency; they involve packet payloads, and thus, we need them
to be contiguously addressable. They usually come straight to or from
socket syscalls, or even libpcap.
So, we don't use std::string for representing them. [-1]
Thrift's current implementations of TProtocol::readBinary() are
likely to introduce copies in this case, because they use std::string.
Bruce Simpson wrote:
> It's likely I'd also need to take a similar approach for the read
> path, my stub generator currently has to bounce buffer.
>
> The TProtocol::readBinary() method probably needs a little bit more
> thinking about, and I'll come back to that -- I notice
> TBinaryProtocol::readBinary() is taking pains not to abuse 'auto'
> (stack) class storage.
I want to avoid intermediate copies if at all possible, without
breaking Thrift's design assumption that TTransport has serial (ie not
random access) semantics. This would be desirable for strings, also,
especially if they get big.
In my situation, the caller generally wants to manage, or at least
instantiate, the buffer which the TProtocol instance reads into. This is
in order to avoid the cost of those intermediate copies, which can
quickly mount up when packets are being sent rapidly.
Here's that example again from XORP's Thrift code generator:
>
> Example:
> [given 's' as a vector<uint8_t> argument]
> string tmp_s;
> nin += inp->readBinary(tmp_s);
> s.resize(tmp_s.size());
> memcpy(&s[0], tmp_s.data(), tmp_s.size());
In the above example, including the transport, there are going to be up
to *5* copies of the same data for an in-flight RPC, not including
kernel socket buffers.
Let's count them:
1 for a TMemoryBuffer managed by my own TTransport class,
1 for its socket read buffer,
1 for the private buffer used by TBinaryProtocol itself,
1 for the std::string temporary which TBinaryProtocol::readBinary()
assigns to [0], and finally,
1 in the vector<uint8_t> I want to read the blob into.
Reasons being:
There is no getting away from the TMemoryBuffer, as my operations
are async in the client (and potentially need one in the server too if
we thread-pool), although I could consume it once the message is sent.
I can't get away from using a socket read buffer, because of the
need to re-acquire message boundaries from the stream, when TCP is used.
SCTP wouldn't have this problem, but that's academic right now. This
cost is part of TNonblockingServer anyway, although I am using my own
TTransport-derived class.
The std::string temporary in TBinaryProtocol::readBinary() is
probably going to get elided by the fact it's an lvalue, but perhaps we
should do what TCompactProtocol does and be explicit about assignment,
to benefit from any library optimizations. [0]
In all cases of TProtocol derived classes, the size of the blob being
marshaled is known upfront, so a reallocation often isn't needed if a
buffer is already to hand. This optimization already exists in
TBinaryProtocol for strings.
For TJSONProtocol, there is no avoiding intermediate buffering of
readBinary(), as base64 decode needs to happen. We could 'peel off' the
str instance by reading into it, and overwriting each byte as we decode
it, which would eliminate another copy; as that method is implemented in
the SVN trunk, it's easier to append to str.
More background:
Part of the problem is that std::string has no notion of shared
storage, at the level of its API. [1].
TBinaryProtocol::readBinary(std::string& str) tries to avoid allocating
stack temporaries. It uses a buffer, managed with the C heap allocator,
private to the class instance. [2]
The std::string constructor called to assign to the 'str' argument
will cause the default allocator to be called, to manage its own copy of
this buffer; another intermediate copy, as std::string has no knowledge
of this buffer's allocator semantics. [3] [4]
Without hacking allocators, or adding another method to the library: it
seems hard, to my mind, to come up with a compiler and library agnostic
solution to my problem, that of coaxing Thrift to read into a
vector<uint8_t> without further copying.
Reasons being:
* Whilst vector<T> has similar methods to std::string for free store
management, exploiting the method similarity would require a template
specialization. No way, jose -- that's largely academic, we are not
doing template methods in TProtocol.
* I looked around to see if there were any constructor tricks I
could use to trick vector<T> into using pre-allocated storage, or easily
casting to it without a copy. No dice. [5]
* I'd just end up reinventing shared_ptr<T>; vector<T> has no
knowledge of how to share and refcount storage, unless you explicitly
tell it to. My application doesn't know about shared_ptr<T> yet, and
using this would mean changing our app's existing code, something I'm
keen to avoid given its size.
[on uint8_t/size_t vs vector<T>]
> This seems a more reasonable approach than limiting it to the use of
> std::vector<T>; type safety is side-stepped for a very specific
> reason. A 'gmake' runs to completion with this patch on FreeBSD
> 7.2-STABLE/amd64.
So it seems, to deal with my need for vector<uint8_t> in my existing
application's RPC methods, without introducing unnecessary allocations
or copies on the receive path. this boils down to two design choices:
1. Just go ahead and implement it in libthrift itself.
2. Add a helper method in my application's IPC library, specific to
TBinaryProtocol, to read the blob from the TTransport directly.
Approach #1:
Thrift is already using <vector> anyway, but not for T = uint8_t.
The additional cost of vector<uint8_t> usage in the library is zero
for static linkage, if the linker knows to discard the template
expansions from the text segment [6]. However, shared linkage is going
introduce weak symbols for vector<uint8_t> methods due to template
expansion, and thus add a small amount of runtime text cost to libthrift.so.
If a limit is placed on the payload size upfront, things are a
little easier, if say we wanted to implement
TProtocol::readBinary(uint8_t* data, size_t& len), without requiring
vector<uint8_t>.
It is a more generic interface, but still the problem looms of how
to resize the buffer if input is bigger than we expect, assuming the
caller manages the buffer. Probably not a good idea to be using C-style
buffer management here.
Even so, there is no easy way to convert a uint8_t array to a vector
without a copy [5].
So it might be best just to add TProtocol::readBinary(vector<uint8_t>&
buf) to the library.
Approach #2:
On the other hand, if we hard-wire everything in our app to
TBinaryProtocol, and explicitly add a helper method, the problem of
doing it generically in the Thrift C++ client library, libthrift.so,
goes away.
I already have a few for encapsulating XORP XRL transport exceptions
as TApplicationExceptions. (I did it this way mainly to avoid spamming
the translated IDL with exception {} decls.)
It would be useful at some point later on to have access to the
other protocols, but TBinaryProtocol probably works best for us.
This means less flexibility for our app, but doesn't introduce
vector<uint8_t> to libthrift.so.
I would prefer to solve the problem generally, and push a patch
upstream, so others can benefit.
At times like this, I wish C++ had Python's buffer management.
On the other hand, this seems like a more general problem: i.e. that of
avoiding intermediate copies on the critical path, whilst preserving
TTransport's serial access semantics.
I'm sure others may also need to get data in and out of Thrift like
this, and an option to stereotype a byte[] array as having these
semantics in the generated C++ code may be useful.
Comments? Suggestions?
regards,
BMS
[-1] Before Thrift, there is probably intermediate copying going on.
Long story.
[0] TCompactProtocol::readBinary() is more explicit, it uses
std::string::assign() on the instance passed to it.
[1] Although implementations can, and often do, implement tricks inside
the C++ standard library, to share storage between instances of
std::string which are associated with each other by assignment or
construction; GNU libstdc++ refcounts the backing store for its
implementation.
[2] Folk paying heed to thread safety will note that
TBinaryProtocol::readBinary() has no re-entrancy guarantee, because of
its private buffer; regardless of whether the C/C++ heap allocator is
re-entrant or not.
[3] Once again: there is no const std::string specialization, so no
opportunity for the compiler to elide the allocation, and just emit a
reference to const storage.
[4] In this case, because not all of the objects involved are instances
of std::string, we can't benefit from optimization in [1], and even if
they were, this optimization is implementation-defined.
[5] Same as for [3]; a const vector<T> gets no special treatment.
[6] LLVM will potentially be very smart at this later on.