You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Piscium <gr...@gmail.com> on 2012/03/10 10:46:07 UTC

usage without RPC

I am looking at Thrift to see if I can use it in a little project of
mine. So far I built Thrift without errors, but have not tested it
yet. And I went through the documentation on the wiki.

I have a couple of questions.

Thrift generates RCP code for client and server. I don't need that. I
am just looking for a tool that I can use to serialize and deserialize
to a string. Can I use Thrift for this narrow purpose? If yes, what
would be the equivalent of SerializeToString in Protocol Buffers?

>From what I read it should be possible to serialize in a C++ function
and desirialize in a C function. Many languages are supported in
Thrift. Is the support for C_GLIB as good and stable as that for C++
for my intended narrow purpose of serializing to a string?

Re: usage without RPC

Posted by Rush Manbert <ru...@manbert.com>.
On Mar 13, 2012, at 4:27 PM, Piscium wrote:

> On 13 March 2012 00:17, Rush Manbert <ru...@manbert.com> wrote:
> 
>> Hi Piscium,
>> 
>> If we're talking about a homogeneous implementation, say all C++, then I would just use STL objects in the interface and not mess with serialization, because anything you can put together in the Thrift IDL you can just as easily define with STL.
>> 
>> But if you want serialization, then you are on exactly the right track. Just use the binary protocol and the TMemoryBuffer transport, then pass the buffer to the callee and reconstitute it there. That works just fine, but can get a little unwieldy if you do it a lot. In your case, it sounds like a single call, so that should be easy and maintainable.
>> 
>> I should probably also mention that the downside of doing this is that, without adding identity information, you are relying on the caller to be well behaved and not pass you a buffer that contains things you don't expect. Buffer overflow attacks come to mind...
> 
> Hi Rush,
> 
> Thanks a lot for the helpful suggestions and information.
> 
> STL: it's a possibility, though my thought is that using Thrift as an
> interface would make the visualization function easier to call from,
> let's say, Python, though this would be for the long term future.

If you want to go cross-language, then Thrift serialization is definitely the better choice.

> 
> Security: again, for now this would be just a desktop application and
> the visualization function would not be available from the internet,
> so I am not much concerned about it.

I thought so, but figured I should mention it.

> 
> To conclude: I have not decided yet what to use, I will probably do a
> few more tests in order to make up my mind. Another library I
> considered is HDF5, which is normally used to store structured data in
> big files. Thrift has a range of options for transport layer, one of
> which is memory buffer. HDF5 likewise has a has a range of options for
> file type, one of which is memory file. One difference is that I don't
> think that HDF5 has a compiler, so that means a good deal of
> boilerplate code. I have no clue about how performance would compare.
> Although HDF5 is optimized for big files there are some settings that
> can be tweaked to have better performance for small files.

I don't know anything about HDF5, but I have a LOT of experience with Thrift. It's easy and intuitive and fast, and it's not too hard to integrate into your build process. We use it many different ways, and I heartily recommend it.

> 
> We will see what happens.

Best of luck to you.

- Rush


Re: usage without RPC

Posted by Piscium <gr...@gmail.com>.
On 13 March 2012 00:17, Rush Manbert <ru...@manbert.com> wrote:

> Hi Piscium,
>
> If we're talking about a homogeneous implementation, say all C++, then I would just use STL objects in the interface and not mess with serialization, because anything you can put together in the Thrift IDL you can just as easily define with STL.
>
> But if you want serialization, then you are on exactly the right track. Just use the binary protocol and the TMemoryBuffer transport, then pass the buffer to the callee and reconstitute it there. That works just fine, but can get a little unwieldy if you do it a lot. In your case, it sounds like a single call, so that should be easy and maintainable.
>
> I should probably also mention that the downside of doing this is that, without adding identity information, you are relying on the caller to be well behaved and not pass you a buffer that contains things you don't expect. Buffer overflow attacks come to mind...

Hi Rush,

Thanks a lot for the helpful suggestions and information.

STL: it's a possibility, though my thought is that using Thrift as an
interface would make the visualization function easier to call from,
let's say, Python, though this would be for the long term future.

Security: again, for now this would be just a desktop application and
the visualization function would not be available from the internet,
so I am not much concerned about it.

To conclude: I have not decided yet what to use, I will probably do a
few more tests in order to make up my mind. Another library I
considered is HDF5, which is normally used to store structured data in
big files. Thrift has a range of options for transport layer, one of
which is memory buffer. HDF5 likewise has a has a range of options for
file type, one of which is memory file. One difference is that I don't
think that HDF5 has a compiler, so that means a good deal of
boilerplate code. I have no clue about how performance would compare.
Although HDF5 is optimized for big files there are some settings that
can be tweaked to have better performance for small files.

We will see what happens.

Re: usage without RPC

Posted by Rush Manbert <ru...@manbert.com>.
On Mar 12, 2012, at 3:39 PM, Piscium wrote:

> On 12 March 2012 17:16, Rush Manbert <ru...@manbert.com> wrote:
>> Of course you can use Thrift without using the RPC part. We do this in many different forms, mostly with custom protocols that we have derived from the binary protocol.
>> 
>> The attachment contains a C++ file that uses the TFileTransport to write data into a file and then read it back. Note that it includes ThriftTest.h, which is generated from thriftSrcDistro/test/ThriftTest.thrift.
>> 
>> If you use the JSON protocol (by commenting out line 20 that defines DENSE and uncommenting line 18 that defines JSON), you will get the data serilized as a string. The binary protocol also works. I'm not sure about the dense protocol.
>> 
>> You can see that the basic method to serialize is to make a Thrift structure, then call its write() method, passing a protocol.
>> 
>> I can't comment on the C support. Haven't tried it.
> 
> Hi Rush,
> 
> Thanks for taking the time to answer my query. In hindsight I should
> have not mentioned the function SerializeToString as I really don't
> know for sure what it does, and per your answer it probably does not
> do what I thought it did. So I will now say a few words about what I
> am trying to accomplish.
> 
> I have a main program that gets data from a few places (example,
> database), takes some user input, does some calculations and then
> calls a visualization function to present the data to the user (or
> print it, or save to file). The problem I face is how the main program
> should pass the data to that function. Because the data is
> heterogeneous and has a somewhat complicated structure I am exploring
> the idea of the main program serializing the data, which will then be
> deserialized by the function.
> 
> There are several advantages to this approach: a clean interface
> between the program and function that is documented in a readable
> format in the .thrift file, and easier maintenance. Obviously there is
> a cost in terms of serialization overhead, though my guess is that the
> overhead would be less than 20% of the time taken to create the
> display so I don't mind.
> 
> Looking at Thrift, it seems that the best would be to use the binary
> protocol. As for transport layer my _guess_ is that what I need is a
> memory buffer. Do you know if such a memory buffer could be used for
> my intended purpose, that is, pass data from the core of the program
> to the function?

Hi Piscium,

If we're talking about a homogeneous implementation, say all C++, then I would just use STL objects in the interface and not mess with serialization, because anything you can put together in the Thrift IDL you can just as easily define with STL.

But if you want serialization, then you are on exactly the right track. Just use the binary protocol and the TMemoryBuffer transport, then pass the buffer to the callee and reconstitute it there. That works just fine, but can get a little unwieldy if you do it a lot. In your case, it sounds like a single call, so that should be easy and maintainable.

I should probably also mention that the downside of doing this is that, without adding identity information, you are relying on the caller to be well behaved and not pass you a buffer that contains things you don't expect. Buffer overflow attacks come to mind...

Best regards,
Rush

Re: usage without RPC

Posted by Piscium <gr...@gmail.com>.
On 12 March 2012 17:16, Rush Manbert <ru...@manbert.com> wrote:
> Of course you can use Thrift without using the RPC part. We do this in many different forms, mostly with custom protocols that we have derived from the binary protocol.
>
> The attachment contains a C++ file that uses the TFileTransport to write data into a file and then read it back. Note that it includes ThriftTest.h, which is generated from thriftSrcDistro/test/ThriftTest.thrift.
>
> If you use the JSON protocol (by commenting out line 20 that defines DENSE and uncommenting line 18 that defines JSON), you will get the data serilized as a string. The binary protocol also works. I'm not sure about the dense protocol.
>
> You can see that the basic method to serialize is to make a Thrift structure, then call its write() method, passing a protocol.
>
> I can't comment on the C support. Haven't tried it.

Hi Rush,

Thanks for taking the time to answer my query. In hindsight I should
have not mentioned the function SerializeToString as I really don't
know for sure what it does, and per your answer it probably does not
do what I thought it did. So I will now say a few words about what I
am trying to accomplish.

I have a main program that gets data from a few places (example,
database), takes some user input, does some calculations and then
calls a visualization function to present the data to the user (or
print it, or save to file). The problem I face is how the main program
should pass the data to that function. Because the data is
heterogeneous and has a somewhat complicated structure I am exploring
the idea of the main program serializing the data, which will then be
deserialized by the function.

There are several advantages to this approach: a clean interface
between the program and function that is documented in a readable
format in the .thrift file, and easier maintenance. Obviously there is
a cost in terms of serialization overhead, though my guess is that the
overhead would be less than 20% of the time taken to create the
display so I don't mind.

Looking at Thrift, it seems that the best would be to use the binary
protocol. As for transport layer my _guess_ is that what I need is a
memory buffer. Do you know if such a memory buffer could be used for
my intended purpose, that is, pass data from the core of the program
to the function?

Re: usage without RPC

Posted by Rush Manbert <ru...@manbert.com>.
Of course you can use Thrift without using the RPC part. We do this in many different forms, mostly with custom protocols that we have derived from the binary protocol.

The attachment contains a C++ file that uses the TFileTransport to write data into a file and then read it back. Note that it includes ThriftTest.h, which is generated from thriftSrcDistro/test/ThriftTest.thrift.

If you use the JSON protocol (by commenting out line 20 that defines DENSE and uncommenting line 18 that defines JSON), you will get the data serilized as a string. The binary protocol also works. I'm not sure about the dense protocol.

You can see that the basic method to serialize is to make a Thrift structure, then call its write() method, passing a protocol.

I can't comment on the C support. Haven't tried it.

- Rush