You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Johan Stuyts <j....@zybber.nl> on 2008/06/06 20:51:54 UTC

Re: [thrift] Serialization

Hi Marcus,

I am moving this discussion to the Thrift user list at Apache (incubator),  
because this list will be closed in the future. You can join this list by  
sending an e-mail with subject 'subscribe' (without the quotes) to:
thrift-user-request@incubator.apache.org

> The nature of a hashmap is that it stores objects which could be typed  
> (or
> not) so the "normal" way of doing this is to just use plain Java
> serialization when you need to distribute the objects.
>
> I fear that the objects must be strongly typed and defined as structs  
> right?

Correct. And Thrift only supports homogenous collections, i.e. it is not  
possible to put different types, or even subclasses of types in a  
container. This limitation is there to allow protocols to send only  
minimal information across the wire, unlike for example Java serialization  
which always includes all metadata.

-- 
Kind regards,

Johan Stuyts

Re: [thrift] Serialization

Posted by Marcus Herou <ma...@tailsweep.com>.
Thing is that I would like to establish a "shared" memory[and/or]storage
between java apps and perl for instance. Something like memcached on acid.

Probably the byte[] most of the times need to be some sort of string rep
anyway (JSON or Properties file syntax maybe) to make them able to handle
the data.

/M




On Sat, Jun 7, 2008 at 9:12 PM, Bryan Duxbury <br...@rapleaf.com> wrote:

> Yeah, don't use base64. There's no point to that if you have a binary data
> type available to you.
>
> Will the clients of this hashmap know the types of the objects they are
> retrieving? If so, my original suggestion of using binary types stored and
> de/serializing at the application layer probably still applies.
>
> -Bryan
>
>
> On Jun 7, 2008, at 9:58 AM, Marcus Herou wrote:
>
>  I'm not 100% sure why I switched, was thinking that the clients could send
>> Base64 encoded strings back and forths but to be honest that's a little
>> lame.
>>
>> It's 5 mins work to switch back and I will do so...
>>
>> /M
>>
>> On Sat, Jun 7, 2008 at 6:52 PM, Johan Stuyts <j....@zybber.nl> wrote:
>>
>>  The only real constraint currently will be that the client need to
>>>
>>>> encode/decode objects to a string representation. I initially made the
>>>> cache
>>>> store byte[] but switched to strings.
>>>>
>>>>
>>> Why did you make the switch? Thrift has a binary type which does what you
>>> need and is supported by all language bindings. Using 'binary' should
>>> remove
>>> the need to encode the data as strings on the client side.
>>>
>>> --
>>> Kind regards,
>>>
>>> Johan Stuyts
>>>
>>>
>>
>>
>> --
>> Marcus Herou CTO and co-founder Tailsweep AB
>> +46702561312
>> marcus.herou@tailsweep.com
>> http://www.tailsweep.com/
>> http://blogg.tailsweep.com/
>>
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: [thrift] Serialization

Posted by Bryan Duxbury <br...@rapleaf.com>.
Yeah, don't use base64. There's no point to that if you have a binary  
data type available to you.

Will the clients of this hashmap know the types of the objects they  
are retrieving? If so, my original suggestion of using binary types  
stored and de/serializing at the application layer probably still  
applies.

-Bryan

On Jun 7, 2008, at 9:58 AM, Marcus Herou wrote:

> I'm not 100% sure why I switched, was thinking that the clients  
> could send
> Base64 encoded strings back and forths but to be honest that's a  
> little
> lame.
>
> It's 5 mins work to switch back and I will do so...
>
> /M
>
> On Sat, Jun 7, 2008 at 6:52 PM, Johan Stuyts <j....@zybber.nl>  
> wrote:
>
>> The only real constraint currently will be that the client need to
>>> encode/decode objects to a string representation. I initially  
>>> made the
>>> cache
>>> store byte[] but switched to strings.
>>>
>>
>> Why did you make the switch? Thrift has a binary type which does  
>> what you
>> need and is supported by all language bindings. Using 'binary'  
>> should remove
>> the need to encode the data as strings on the client side.
>>
>> --
>> Kind regards,
>>
>> Johan Stuyts
>>
>
>
>
> -- 
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.herou@tailsweep.com
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/


Re: [thrift] Serialization

Posted by Marcus Herou <ma...@tailsweep.com>.
I'm not 100% sure why I switched, was thinking that the clients could send
Base64 encoded strings back and forths but to be honest that's a little
lame.

It's 5 mins work to switch back and I will do so...

/M

On Sat, Jun 7, 2008 at 6:52 PM, Johan Stuyts <j....@zybber.nl> wrote:

> The only real constraint currently will be that the client need to
>> encode/decode objects to a string representation. I initially made the
>> cache
>> store byte[] but switched to strings.
>>
>
> Why did you make the switch? Thrift has a binary type which does what you
> need and is supported by all language bindings. Using 'binary' should remove
> the need to encode the data as strings on the client side.
>
> --
> Kind regards,
>
> Johan Stuyts
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: [thrift] Serialization

Posted by Johan Stuyts <j....@zybber.nl>.
> The only real constraint currently will be that the client need to
> encode/decode objects to a string representation. I initially made the  
> cache
> store byte[] but switched to strings.

Why did you make the switch? Thrift has a binary type which does what you  
need and is supported by all language bindings. Using 'binary' should  
remove the need to encode the data as strings on the client side.

-- 
Kind regards,

Johan Stuyts

Re: [thrift] Serialization

Posted by Marcus Herou <ma...@tailsweep.com>.
Hi.

Thanks for the insights.

One of the reasons why I want to use Thrift is that the Thrift client/server
seem to perform really really well. I've created simple cache which is
forced to store only strings in a HashMap. It can perform 11 000 lookups per
sec and get a total read throughput of about 50Mbyte/sec which is in the
same league as if I was writing to a fast BTree or such sequentially. I have
tested it with payloads from 1K up to 64K. My own http implementation does
not perform this well damnit :)

The only real constraint currently will be that the client need to
encode/decode objects to a string representation. I initially made the cache
store byte[] but switched to strings.

Kindly

//Marcus


On Sat, Jun 7, 2008 at 1:47 PM, Johan Stuyts <j....@zybber.nl> wrote:

> If you need to store heterogeneous collections of items, as you probably
>> would like to in a distributed hashmap, you should just make the type of
>> stored objects binary, and then handle the serialization/deserialization at
>> the application level. This will make your hashmap implementation very
>> general and simple, too.
>>
>
> To me this would no longer be a Thrift solution. It requires that data is
> explicitly serialized/deserialized, and, more importantly, the serialization
> protocol is probably not implemented in the many languages that Thrift
> supports. What if somebody wants to use your Thrift service and writes a
> client program for it, only to find out that he cannot use the binary data
> stored in the maps? If you decide to go this route, make sure you understand
> the implications.
>
> If heterogeneous/polymorphic data (in containers or otherwise) is needed, I
> suggest to use another RPC protocol, e.g. RMI or SOAP.
>
> --
> Kind regards,
>
> Johan Stuyts
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: [thrift] Serialization

Posted by Johan Stuyts <j....@zybber.nl>.
> If you need to store heterogeneous collections of items, as you probably  
> would like to in a distributed hashmap, you should just make the type of  
> stored objects binary, and then handle the serialization/deserialization  
> at the application level. This will make your hashmap implementation  
> very general and simple, too.

To me this would no longer be a Thrift solution. It requires that data is  
explicitly serialized/deserialized, and, more importantly, the  
serialization protocol is probably not implemented in the many languages  
that Thrift supports. What if somebody wants to use your Thrift service  
and writes a client program for it, only to find out that he cannot use  
the binary data stored in the maps? If you decide to go this route, make  
sure you understand the implications.

If heterogeneous/polymorphic data (in containers or otherwise) is needed,  
I suggest to use another RPC protocol, e.g. RMI or SOAP.

-- 
Kind regards,

Johan Stuyts

Re: [thrift] Serialization

Posted by Bryan Duxbury <br...@rapleaf.com>.
If you need to store heterogeneous collections of items, as you  
probably would like to in a distributed hashmap, you should just make  
the type of stored objects binary, and then handle the serialization/ 
deserialization at the application level. This will make your hashmap  
implementation very general and simple, too.

-Bryan

On Jun 6, 2008, at 11:51 AM, Johan Stuyts wrote:

> Hi Marcus,
>
> I am moving this discussion to the Thrift user list at Apache  
> (incubator), because this list will be closed in the future. You  
> can join this list by sending an e-mail with subject  
> 'subscribe' (without the quotes) to:
> thrift-user-request@incubator.apache.org
>
>> The nature of a hashmap is that it stores objects which could be  
>> typed (or
>> not) so the "normal" way of doing this is to just use plain Java
>> serialization when you need to distribute the objects.
>>
>> I fear that the objects must be strongly typed and defined as  
>> structs right?
>
> Correct. And Thrift only supports homogenous collections, i.e. it  
> is not possible to put different types, or even subclasses of types  
> in a container. This limitation is there to allow protocols to send  
> only minimal information across the wire, unlike for example Java  
> serialization which always includes all metadata.
>
> -- 
> Kind regards,
>
> Johan Stuyts