You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Raghava Mutharaju <m....@gmail.com> on 2010/07/23 22:27:09 UTC

C++ deserialization - record length?

Hi all,

I have serialized couple of Employee objects (binary protocol) and saved the
byte array into a file. I am using C++ to deserialize. I would reach bytes
of one Employee from the file each time and give it to read() method. If I
use sizeof(Employee), it does not give the right size.

Some statistics from a sample run:

Total Employee objects serialized = 10
Total bytes in the file = 690. So each Employee object size should be 69
bytes.
sizeof(Employee) gives 96.

This seems to be a frequently used operation. How can this be done?
I would read each object, fill the TMemoryBuffer and use it to construct
TBinaryProtocol.

Thank you.

Regards,
Raghava.

Re: C++ deserialization - record length?

Posted by Bryan Duxbury <br...@rapleaf.com>.
I typically use a 4-byte size to represent the size of the record that
follows. You may be able to get away with less through a variety of
strategies.

On Fri, Jul 23, 2010 at 2:22 PM, Raghava Mutharaju <
m.vijayaraghava@gmail.com> wrote:

> Hi Bryan,
>
> Thank you for the reply. In that case, I will try the first option of
> preceding the data with its length. How can this hookup (2nd solution) be
> done?
>
> In the first solution, there would be issues regarding the no of bytes to
> be
> used to save the size of the data isn't it. If I use a byte, then I can
> only
> keep track of data (object) of max size 2^8-1. Isn't this a problem?
>
> Thank you.
>
> Regards,
> Raghava.
>
> On Fri, Jul 23, 2010 at 5:07 PM, Bryan Duxbury <br...@rapleaf.com> wrote:
>
> > 1) If and only if they had the exact same contents. Which means, probably
> > not.
> >
> > The usual solution to this problem is to serialize into a buffer and
> write
> > the size to the file before writing the serialized data. An alternative
> is
> > to hook your deserializer up directly to the file stream and letting
> Thrift
> > figure out where stuff starts and ends, though this can be a bit tricky.
> >
> > On Fri, Jul 23, 2010 at 1:47 PM, Raghava Mutharaju <
> > m.vijayaraghava@gmail.com> wrote:
> >
> > > Hi Bryan,
> > >
> > > Thank you for the reply. In that case, I have couple of more questions.
> > >
> > > 1) Would the buffers of 2 Employee objects have same size?
> > > 2) If the answer to above question is 'yes', then I need to somehow
> pass
> > > the
> > > buffer size to the other application which would deserialize these list
> > of
> > > objects. Would writing this value to another file be a good option?
> > >
> > > Thank you.
> > >
> > > Regards,
> > > Raghava.
> > >
> > > On Fri, Jul 23, 2010 at 4:38 PM, Bryan Duxbury <br...@rapleaf.com>
> > wrote:
> > >
> > > > Serialized thrift objects aren't fixed size, nor do their in-memory
> > > > representations reflect their serialized representation. Unless
> there's
> > > > more
> > > > magic to sizeof() than I was expecting, you need to look at the size
> of
> > > the
> > > > buffer after you write out, not the size of the struct.
> > > >
> > > > On Fri, Jul 23, 2010 at 1:27 PM, Raghava Mutharaju <
> > > > m.vijayaraghava@gmail.com> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I have serialized couple of Employee objects (binary protocol) and
> > > saved
> > > > > the
> > > > > byte array into a file. I am using C++ to deserialize. I would
> reach
> > > > bytes
> > > > > of one Employee from the file each time and give it to read()
> method.
> > > If
> > > > I
> > > > > use sizeof(Employee), it does not give the right size.
> > > > >
> > > > > Some statistics from a sample run:
> > > > >
> > > > > Total Employee objects serialized = 10
> > > > > Total bytes in the file = 690. So each Employee object size should
> be
> > > 69
> > > > > bytes.
> > > > > sizeof(Employee) gives 96.
> > > > >
> > > > > This seems to be a frequently used operation. How can this be done?
> > > > > I would read each object, fill the TMemoryBuffer and use it to
> > > construct
> > > > > TBinaryProtocol.
> > > > >
> > > > > Thank you.
> > > > >
> > > > > Regards,
> > > > > Raghava.
> > > > >
> > > >
> > >
> >
>

Re: C++ deserialization - record length?

Posted by Raghava Mutharaju <m....@gmail.com>.
Hi Bryan,

Thank you for the reply. In that case, I will try the first option of
preceding the data with its length. How can this hookup (2nd solution) be
done?

In the first solution, there would be issues regarding the no of bytes to be
used to save the size of the data isn't it. If I use a byte, then I can only
keep track of data (object) of max size 2^8-1. Isn't this a problem?

Thank you.

Regards,
Raghava.

On Fri, Jul 23, 2010 at 5:07 PM, Bryan Duxbury <br...@rapleaf.com> wrote:

> 1) If and only if they had the exact same contents. Which means, probably
> not.
>
> The usual solution to this problem is to serialize into a buffer and write
> the size to the file before writing the serialized data. An alternative is
> to hook your deserializer up directly to the file stream and letting Thrift
> figure out where stuff starts and ends, though this can be a bit tricky.
>
> On Fri, Jul 23, 2010 at 1:47 PM, Raghava Mutharaju <
> m.vijayaraghava@gmail.com> wrote:
>
> > Hi Bryan,
> >
> > Thank you for the reply. In that case, I have couple of more questions.
> >
> > 1) Would the buffers of 2 Employee objects have same size?
> > 2) If the answer to above question is 'yes', then I need to somehow pass
> > the
> > buffer size to the other application which would deserialize these list
> of
> > objects. Would writing this value to another file be a good option?
> >
> > Thank you.
> >
> > Regards,
> > Raghava.
> >
> > On Fri, Jul 23, 2010 at 4:38 PM, Bryan Duxbury <br...@rapleaf.com>
> wrote:
> >
> > > Serialized thrift objects aren't fixed size, nor do their in-memory
> > > representations reflect their serialized representation. Unless there's
> > > more
> > > magic to sizeof() than I was expecting, you need to look at the size of
> > the
> > > buffer after you write out, not the size of the struct.
> > >
> > > On Fri, Jul 23, 2010 at 1:27 PM, Raghava Mutharaju <
> > > m.vijayaraghava@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have serialized couple of Employee objects (binary protocol) and
> > saved
> > > > the
> > > > byte array into a file. I am using C++ to deserialize. I would reach
> > > bytes
> > > > of one Employee from the file each time and give it to read() method.
> > If
> > > I
> > > > use sizeof(Employee), it does not give the right size.
> > > >
> > > > Some statistics from a sample run:
> > > >
> > > > Total Employee objects serialized = 10
> > > > Total bytes in the file = 690. So each Employee object size should be
> > 69
> > > > bytes.
> > > > sizeof(Employee) gives 96.
> > > >
> > > > This seems to be a frequently used operation. How can this be done?
> > > > I would read each object, fill the TMemoryBuffer and use it to
> > construct
> > > > TBinaryProtocol.
> > > >
> > > > Thank you.
> > > >
> > > > Regards,
> > > > Raghava.
> > > >
> > >
> >
>

Re: C++ deserialization - record length?

Posted by Bryan Duxbury <br...@rapleaf.com>.
1) If and only if they had the exact same contents. Which means, probably
not.

The usual solution to this problem is to serialize into a buffer and write
the size to the file before writing the serialized data. An alternative is
to hook your deserializer up directly to the file stream and letting Thrift
figure out where stuff starts and ends, though this can be a bit tricky.

On Fri, Jul 23, 2010 at 1:47 PM, Raghava Mutharaju <
m.vijayaraghava@gmail.com> wrote:

> Hi Bryan,
>
> Thank you for the reply. In that case, I have couple of more questions.
>
> 1) Would the buffers of 2 Employee objects have same size?
> 2) If the answer to above question is 'yes', then I need to somehow pass
> the
> buffer size to the other application which would deserialize these list of
> objects. Would writing this value to another file be a good option?
>
> Thank you.
>
> Regards,
> Raghava.
>
> On Fri, Jul 23, 2010 at 4:38 PM, Bryan Duxbury <br...@rapleaf.com> wrote:
>
> > Serialized thrift objects aren't fixed size, nor do their in-memory
> > representations reflect their serialized representation. Unless there's
> > more
> > magic to sizeof() than I was expecting, you need to look at the size of
> the
> > buffer after you write out, not the size of the struct.
> >
> > On Fri, Jul 23, 2010 at 1:27 PM, Raghava Mutharaju <
> > m.vijayaraghava@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I have serialized couple of Employee objects (binary protocol) and
> saved
> > > the
> > > byte array into a file. I am using C++ to deserialize. I would reach
> > bytes
> > > of one Employee from the file each time and give it to read() method.
> If
> > I
> > > use sizeof(Employee), it does not give the right size.
> > >
> > > Some statistics from a sample run:
> > >
> > > Total Employee objects serialized = 10
> > > Total bytes in the file = 690. So each Employee object size should be
> 69
> > > bytes.
> > > sizeof(Employee) gives 96.
> > >
> > > This seems to be a frequently used operation. How can this be done?
> > > I would read each object, fill the TMemoryBuffer and use it to
> construct
> > > TBinaryProtocol.
> > >
> > > Thank you.
> > >
> > > Regards,
> > > Raghava.
> > >
> >
>

Re: C++ deserialization - record length?

Posted by Raghava Mutharaju <m....@gmail.com>.
Hi Bryan,

Thank you for the reply. In that case, I have couple of more questions.

1) Would the buffers of 2 Employee objects have same size?
2) If the answer to above question is 'yes', then I need to somehow pass the
buffer size to the other application which would deserialize these list of
objects. Would writing this value to another file be a good option?

Thank you.

Regards,
Raghava.

On Fri, Jul 23, 2010 at 4:38 PM, Bryan Duxbury <br...@rapleaf.com> wrote:

> Serialized thrift objects aren't fixed size, nor do their in-memory
> representations reflect their serialized representation. Unless there's
> more
> magic to sizeof() than I was expecting, you need to look at the size of the
> buffer after you write out, not the size of the struct.
>
> On Fri, Jul 23, 2010 at 1:27 PM, Raghava Mutharaju <
> m.vijayaraghava@gmail.com> wrote:
>
> > Hi all,
> >
> > I have serialized couple of Employee objects (binary protocol) and saved
> > the
> > byte array into a file. I am using C++ to deserialize. I would reach
> bytes
> > of one Employee from the file each time and give it to read() method. If
> I
> > use sizeof(Employee), it does not give the right size.
> >
> > Some statistics from a sample run:
> >
> > Total Employee objects serialized = 10
> > Total bytes in the file = 690. So each Employee object size should be 69
> > bytes.
> > sizeof(Employee) gives 96.
> >
> > This seems to be a frequently used operation. How can this be done?
> > I would read each object, fill the TMemoryBuffer and use it to construct
> > TBinaryProtocol.
> >
> > Thank you.
> >
> > Regards,
> > Raghava.
> >
>

Re: C++ deserialization - record length?

Posted by Bryan Duxbury <br...@rapleaf.com>.
Serialized thrift objects aren't fixed size, nor do their in-memory
representations reflect their serialized representation. Unless there's more
magic to sizeof() than I was expecting, you need to look at the size of the
buffer after you write out, not the size of the struct.

On Fri, Jul 23, 2010 at 1:27 PM, Raghava Mutharaju <
m.vijayaraghava@gmail.com> wrote:

> Hi all,
>
> I have serialized couple of Employee objects (binary protocol) and saved
> the
> byte array into a file. I am using C++ to deserialize. I would reach bytes
> of one Employee from the file each time and give it to read() method. If I
> use sizeof(Employee), it does not give the right size.
>
> Some statistics from a sample run:
>
> Total Employee objects serialized = 10
> Total bytes in the file = 690. So each Employee object size should be 69
> bytes.
> sizeof(Employee) gives 96.
>
> This seems to be a frequently used operation. How can this be done?
> I would read each object, fill the TMemoryBuffer and use it to construct
> TBinaryProtocol.
>
> Thank you.
>
> Regards,
> Raghava.
>