You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by "John R. Frank" <jr...@mit.edu> on 2013/03/14 11:46:52 UTC
updating to longer ints
Thrift Experts,
Is there any way to make a thrift message with a longer integer read an
older message definition with a smaller integer?
struct Thing_original {
1: i16 idx,
}
struct Thing_newer {
1: i32 idx,
}
I'd like to be able to have Thing_newer read *both* kinds of idx. Simple
tests show that this does not work.
Is there a way to tell Thing_newer to fallback to looking for (idx, i16)
if it cannot find (idx, i32)?
JOhn
Re: updating to longer ints
Posted by "John R. Frank" <jr...@mit.edu>.
> The Thrift "way" would be to create a new field for the i32 field, say
> 101.
>
> In other words, the logic that you propose, would have to be handled at
> the application layer.
To enable that application layer, there would have to be two differently
names properties on Thing, like idx and idx32
This raises a bigger question: versioning.
The real form of our Thing is this public interface, which has a
"Versions" struct that is used today just for debugging. I wish I could
use it for more.
https://github.com/trec-kba/streamcorpus/blob/master/if/streamcorpus.thrift
We use these thrift interfaces to store this large public data set of
text:
http://s3.amazonaws.com/aws-publicdatasets/trec/index.html
Right now, any program that uses the streamcorpus.thrift interfaces must
be told by the user whether it is processing v0_1_0 or v0_2_0. Soon there
will be v0_3_0 to cope with our picking i16 when we needed i32 for
MentionID. We do re-spin the entire corpus occassionally, but we prefer
to save up changes and spend a CPU-decade on several changes at once.
It would be really helpful/powerful for thrift to handle such versioning
for us. Most simply, there could be a thrift primitive at the level of a
whole message that says "I am version X", where the engineer gets to
specify X in the .thrift file.
At a more granular level, versioning could be used to enable falling back
to different meanings for properties with the same name -- probably with
different slot numbers and types.
Does anything like this exist yet?
Would the Thrift community be open to considering such a feature? Maybe
this has already been discussed?
Thanks for listening,
jrf
Re: updating to longer ints
Posted by George Chung <ge...@glympse.com>.
I'm pretty sure the answer would be "no".
The Thrift "way" would be to create a new field for the i32 field, say 101.
The backwards compatible service would be able to read from field 1 or
field 101. Newer services would only produce an integer value at field 101.
Older services would provide the integer value at field 1.
In other words, the logic that you propose, would have to be handled at the
application layer.
On Thu, Mar 14, 2013 at 3:46 AM, John R. Frank <jr...@mit.edu> wrote:
> Thrift Experts,
>
> Is there any way to make a thrift message with a longer integer read an
> older message definition with a smaller integer?
>
> struct Thing_original {
> 1: i16 idx,
> }
>
> struct Thing_newer {
> 1: i32 idx,
> }
>
>
> I'd like to be able to have Thing_newer read *both* kinds of idx. Simple
> tests show that this does not work.
>
> Is there a way to tell Thing_newer to fallback to looking for (idx, i16)
> if it cannot find (idx, i32)?
>
>
> JOhn
>
Re: updating to longer ints
Posted by Jens Geyer <je...@hotmail.com>.
Hi Henrique,
> Since the i16 side wouldn't be able to read the i32's
> that would came either as an input or output of your
> function. Therefore, to make it work we would have
> to be able to read larger numbers too, and cast them
> down.
Agree. The size of the internal data field is the limit that counts, as the
data have to be stored somewhere. So the last one would become then
something like:
case ::apache::thrift::protocol::T_I64:
int64_t tmp4712;
xfer += iprot->readI16(tmp4712);
if ( (tmp4712 < std::numeric_limits<int32_t>::min()) ||
(tmp4712 > std::numeric_limits<int32_t>::max())) {
throw TProtocolException( TProtocolException::SIZE_LIMIT);
}
this->thing = (int32_t)tmp4712;
this->__isset.thing = true;
break;
All of this should be achieveable by means of a specialized protocol
wrapper, so we don't even have to bloat (and modify) the generated code.
Looking for a good name .... do you have one?
Jens
-----Ursprüngliche Nachricht-----
From: Henrique Mendonça
Sent: Tuesday, March 26, 2013 11:03 PM
To: user@thrift.apache.org ; dev@thrift.apache.org
Subject: Re: updating to longer ints
Hi Jens,
That's an interesting proposition, sorry for the delayed answer, but if I'm
not wrong with your example it would generate exceptions independently of
the side you update first.
Since the i16 side wouldn't be able to read the i32's that would came
either as an input or output of your function. Therefore, to make it work
we would have to be able to read larger numbers too, and cast them down.
This could be a little tricky for an user to understand the results, and
increase the complexity of the compiler quite a bit.
However, it's theoretically possible and I don't think it would affect the
performance that much.
- Henrique
On 15 March 2013 19:57, Jens Geyer <je...@hotmail.com> wrote:
> I think TCompactProtocol uses var length ints, achieving
>> something similar to what you mentioned.
>>
>
> Yep, but good point anyway.
>
>
> Not that this would help w/ the original issue.
>>
>
> Of course, because it is only "something similar" and suffers from the
> same limitations. What the OP suggests and/or what I was thinking of is
> something like this (this would be generated C++ code for one field) :
>
> case 1:
> switch( ftype) {
> case ::apache::thrift::protocol::T_BYTE:
> int8_t tmp4710;
> xfer += iprot->readByte(tmp4710);
> this->thing = tmp4710;
> this->__isset.thing = true;
> break;
> case ::apache::thrift::protocol::T_I16:
> int16_t tmp4711;
> xfer += iprot->readI16(tmp4711);
> this->thing = tmp4711;
> this->__isset.thing = true;
> break;
>
> case ::apache::thrift::protocol::T_I32:
> xfer += iprot->readI32(this->thing);
> this->__isset.thing = true;
> break;
> case ::apache::thrift::protocol::T_I64:
> throw TProtocolException( TProtocolException::SIZE_LIMIT);
> else
> xfer += iprot->skip(ftype);
> break;
> }
> break;
>
> and similarly with the generated write() code. This way all protocols
> would profit form it (except for those where it makes no difference with
> regard to serialized data, such as JSON).
>
> The two biggest problem that I see are a) it breaks compatibility, so we
> have to increase the VERSION number, and b) it might affect overall
> serialization/deserialization performance.
>
> Jens
>
>
>
>
>
>
>
>
>
>
Re: updating to longer ints
Posted by Jens Geyer <je...@hotmail.com>.
Hi Henrique,
> Since the i16 side wouldn't be able to read the i32's
> that would came either as an input or output of your
> function. Therefore, to make it work we would have
> to be able to read larger numbers too, and cast them
> down.
Agree. The size of the internal data field is the limit that counts, as the
data have to be stored somewhere. So the last one would become then
something like:
case ::apache::thrift::protocol::T_I64:
int64_t tmp4712;
xfer += iprot->readI16(tmp4712);
if ( (tmp4712 < std::numeric_limits<int32_t>::min()) ||
(tmp4712 > std::numeric_limits<int32_t>::max())) {
throw TProtocolException( TProtocolException::SIZE_LIMIT);
}
this->thing = (int32_t)tmp4712;
this->__isset.thing = true;
break;
All of this should be achieveable by means of a specialized protocol
wrapper, so we don't even have to bloat (and modify) the generated code.
Looking for a good name .... do you have one?
Jens
-----Ursprüngliche Nachricht-----
From: Henrique Mendonça
Sent: Tuesday, March 26, 2013 11:03 PM
To: user@thrift.apache.org ; dev@thrift.apache.org
Subject: Re: updating to longer ints
Hi Jens,
That's an interesting proposition, sorry for the delayed answer, but if I'm
not wrong with your example it would generate exceptions independently of
the side you update first.
Since the i16 side wouldn't be able to read the i32's that would came
either as an input or output of your function. Therefore, to make it work
we would have to be able to read larger numbers too, and cast them down.
This could be a little tricky for an user to understand the results, and
increase the complexity of the compiler quite a bit.
However, it's theoretically possible and I don't think it would affect the
performance that much.
- Henrique
On 15 March 2013 19:57, Jens Geyer <je...@hotmail.com> wrote:
> I think TCompactProtocol uses var length ints, achieving
>> something similar to what you mentioned.
>>
>
> Yep, but good point anyway.
>
>
> Not that this would help w/ the original issue.
>>
>
> Of course, because it is only "something similar" and suffers from the
> same limitations. What the OP suggests and/or what I was thinking of is
> something like this (this would be generated C++ code for one field) :
>
> case 1:
> switch( ftype) {
> case ::apache::thrift::protocol::T_BYTE:
> int8_t tmp4710;
> xfer += iprot->readByte(tmp4710);
> this->thing = tmp4710;
> this->__isset.thing = true;
> break;
> case ::apache::thrift::protocol::T_I16:
> int16_t tmp4711;
> xfer += iprot->readI16(tmp4711);
> this->thing = tmp4711;
> this->__isset.thing = true;
> break;
>
> case ::apache::thrift::protocol::T_I32:
> xfer += iprot->readI32(this->thing);
> this->__isset.thing = true;
> break;
> case ::apache::thrift::protocol::T_I64:
> throw TProtocolException( TProtocolException::SIZE_LIMIT);
> else
> xfer += iprot->skip(ftype);
> break;
> }
> break;
>
> and similarly with the generated write() code. This way all protocols
> would profit form it (except for those where it makes no difference with
> regard to serialized data, such as JSON).
>
> The two biggest problem that I see are a) it breaks compatibility, so we
> have to increase the VERSION number, and b) it might affect overall
> serialization/deserialization performance.
>
> Jens
>
>
>
>
>
>
>
>
>
>
Re: updating to longer ints
Posted by Henrique Mendonça <he...@apache.org>.
Hi Jens,
That's an interesting proposition, sorry for the delayed answer, but if I'm
not wrong with your example it would generate exceptions independently of
the side you update first.
Since the i16 side wouldn't be able to read the i32's that would came
either as an input or output of your function. Therefore, to make it work
we would have to be able to read larger numbers too, and cast them down.
This could be a little tricky for an user to understand the results, and
increase the complexity of the compiler quite a bit.
However, it's theoretically possible and I don't think it would affect the
performance that much.
- Henrique
On 15 March 2013 19:57, Jens Geyer <je...@hotmail.com> wrote:
> I think TCompactProtocol uses var length ints, achieving
>> something similar to what you mentioned.
>>
>
> Yep, but good point anyway.
>
>
> Not that this would help w/ the original issue.
>>
>
> Of course, because it is only "something similar" and suffers from the
> same limitations. What the OP suggests and/or what I was thinking of is
> something like this (this would be generated C++ code for one field) :
>
> case 1:
> switch( ftype) {
> case ::apache::thrift::protocol::T_**BYTE:
> int8_t tmp4710;
> xfer += iprot->readByte(tmp4710);
> this->thing = tmp4710;
> this->__isset.thing = true;
> break;
> case ::apache::thrift::protocol::T_**I16:
> int16_t tmp4711;
> xfer += iprot->readI16(tmp4711);
> this->thing = tmp4711;
> this->__isset.thing = true;
> break;
>
> case ::apache::thrift::protocol::T_**I32:
> xfer += iprot->readI32(this->thing);
> this->__isset.thing = true;
> break;
> case ::apache::thrift::protocol::T_**I64:
> throw TProtocolException(**TProtocolException::SIZE_**LIMIT);
> else
> xfer += iprot->skip(ftype);
> break;
> }
> break;
>
> and similarly with the generated write() code. This way all protocols
> would profit form it (except for those where it makes no difference with
> regard to serialized data, such as JSON).
>
> The two biggest problem that I see are a) it breaks compatibility, so we
> have to increase the VERSION number, and b) it might affect overall
> serialization/deserialization performance.
>
> Jens
>
>
>
>
>
>
>
>
>
>
Re: updating to longer ints
Posted by Henrique Mendonça <he...@apache.org>.
Hi Jens,
That's an interesting proposition, sorry for the delayed answer, but if I'm
not wrong with your example it would generate exceptions independently of
the side you update first.
Since the i16 side wouldn't be able to read the i32's that would came
either as an input or output of your function. Therefore, to make it work
we would have to be able to read larger numbers too, and cast them down.
This could be a little tricky for an user to understand the results, and
increase the complexity of the compiler quite a bit.
However, it's theoretically possible and I don't think it would affect the
performance that much.
- Henrique
On 15 March 2013 19:57, Jens Geyer <je...@hotmail.com> wrote:
> I think TCompactProtocol uses var length ints, achieving
>> something similar to what you mentioned.
>>
>
> Yep, but good point anyway.
>
>
> Not that this would help w/ the original issue.
>>
>
> Of course, because it is only "something similar" and suffers from the
> same limitations. What the OP suggests and/or what I was thinking of is
> something like this (this would be generated C++ code for one field) :
>
> case 1:
> switch( ftype) {
> case ::apache::thrift::protocol::T_**BYTE:
> int8_t tmp4710;
> xfer += iprot->readByte(tmp4710);
> this->thing = tmp4710;
> this->__isset.thing = true;
> break;
> case ::apache::thrift::protocol::T_**I16:
> int16_t tmp4711;
> xfer += iprot->readI16(tmp4711);
> this->thing = tmp4711;
> this->__isset.thing = true;
> break;
>
> case ::apache::thrift::protocol::T_**I32:
> xfer += iprot->readI32(this->thing);
> this->__isset.thing = true;
> break;
> case ::apache::thrift::protocol::T_**I64:
> throw TProtocolException(**TProtocolException::SIZE_**LIMIT);
> else
> xfer += iprot->skip(ftype);
> break;
> }
> break;
>
> and similarly with the generated write() code. This way all protocols
> would profit form it (except for those where it makes no difference with
> regard to serialized data, such as JSON).
>
> The two biggest problem that I see are a) it breaks compatibility, so we
> have to increase the VERSION number, and b) it might affect overall
> serialization/deserialization performance.
>
> Jens
>
>
>
>
>
>
>
>
>
>
Re: updating to longer ints
Posted by Jens Geyer <je...@hotmail.com>.
> I think TCompactProtocol uses var length ints, achieving
> something similar to what you mentioned.
Yep, but good point anyway.
> Not that this would help w/ the original issue.
Of course, because it is only "something similar" and suffers from the same
limitations. What the OP suggests and/or what I was thinking of is something
like this (this would be generated C++ code for one field) :
case 1:
switch( ftype) {
case ::apache::thrift::protocol::T_BYTE:
int8_t tmp4710;
xfer += iprot->readByte(tmp4710);
this->thing = tmp4710;
this->__isset.thing = true;
break;
case ::apache::thrift::protocol::T_I16:
int16_t tmp4711;
xfer += iprot->readI16(tmp4711);
this->thing = tmp4711;
this->__isset.thing = true;
break;
case ::apache::thrift::protocol::T_I32:
xfer += iprot->readI32(this->thing);
this->__isset.thing = true;
break;
case ::apache::thrift::protocol::T_I64:
throw TProtocolException(TProtocolException::SIZE_LIMIT);
else
xfer += iprot->skip(ftype);
break;
}
break;
and similarly with the generated write() code. This way all protocols would
profit form it (except for those where it makes no difference with regard to
serialized data, such as JSON).
The two biggest problem that I see are a) it breaks compatibility, so we
have to increase the VERSION number, and b) it might affect overall
serialization/deserialization performance.
Jens
Re: updating to longer ints
Posted by Keith Turner <ke...@deenlo.com>.
On Fri, Mar 15, 2013 at 4:07 AM, Jens Geyer <je...@hotmail.com> wrote:
> Hi,
>
> I like the idea of the data types getting promoted to bigger ones on read,
> but I'm still afraid it won't work. Lets assume a service like this:
>
> struct foo {
> 1: i16 bignumber,
> 2: i16 anotherone,
> 3: i16 third
> }
>
> service bar {
> foo Something( 1: foo input)
> }
>
> Now, after some time, you find out, that i16 is way too small, so you change
> foo{} to hold all i32's.
>
> First, the real issue is, when not all clients and servers are under your
> control and can be updated at once, it will break. The generated code holds
> declarations for all variables, so it can't be promoted on the fly only as
> far as the current data structures allow this.
>
> But OTOH the idea has some good aspects, since it brings in the idea that
> the size of the variable must not necessarily be equal to the amount of the
> bytes on the wire, e.g. a int32 = 0x00001234 could be read/written as i16
> without losing information. The only thing that may be affected here is the
> performance, because the code has to check for appropriate write and read
> sizes.
I think TCompactProtocol uses var length ints, achieving something
similar to what you mentioned. Not that this would help w/ the
original issue.
>
> Jens
>
>
> -----Ursprüngliche Nachricht----- From: Will Pierce
> Sent: Thursday, March 14, 2013 10:47 PM
> To: user@thrift.apache.org
> Subject: Re: updating to longer ints
>
>
> Definitely not an easy problem. On the wire these will both encode as the
> same field ID (1) and a type ID of 6 for the i16 or a type ID of 8 for the
> i32. Most (all?) thrift language bindings check to ensure the type ID for
> a field matches the expected type ID for that particular field, silently
> skipping past any field with mismatched type IDs.
>
> George Chung's suggestion is the safest/cleanest approach.
>
> I had exactly this same problem where I needed to expand from i32 to i64.
> It was a "flag day" cutover for me, since I had control of all the clients
> and servers. But I can see how you frequently can't upgrade all the client
> code out in the wild. (Or data files, if these are persisted.)
>
> Rather than try to solve the broader problem of versioning, maybe we should
> consider adding optional behavior to the decoder to permit
> *type-promotion*when the field's on-wire type can be safely decoded
> into the expected type?
>
> Provided that the expected type can faithfully represent all the values of
> the wire type, it seems like a way to follow the "TCP" pattern of being
> permissive in what you receive, but strict in what you send.
>
> In terms of thrift TTypes, we could allow a decoder to promote:
> bool > i08 > i16 > i32 > (double or i64)
>
> This wouldn't solve every problem - like sending a response struct with an
> i32 field to a client than expects i16. No go there. This would be only
> useful in cases of having to read client input from a mixed bag of old and
> new clients, where the server response struct doesn't need any wider
> fields. I'm not sure if that is your use-case...
>
> Those using thrift for serialization to/from disk, I would imagine
> type-promotion could save a lot of effort in re-encoding any old files that
> use narrower numeric types than newer files...
>
> Thoughts?
Re: updating to longer ints
Posted by Jens Geyer <je...@hotmail.com>.
Hi,
I like the idea of the data types getting promoted to bigger ones on read,
but I'm still afraid it won't work. Lets assume a service like this:
struct foo {
1: i16 bignumber,
2: i16 anotherone,
3: i16 third
}
service bar {
foo Something( 1: foo input)
}
Now, after some time, you find out, that i16 is way too small, so you change
foo{} to hold all i32's.
First, the real issue is, when not all clients and servers are under your
control and can be updated at once, it will break. The generated code holds
declarations for all variables, so it can't be promoted on the fly only as
far as the current data structures allow this.
But OTOH the idea has some good aspects, since it brings in the idea that
the size of the variable must not necessarily be equal to the amount of the
bytes on the wire, e.g. a int32 = 0x00001234 could be read/written as i16
without losing information. The only thing that may be affected here is the
performance, because the code has to check for appropriate write and read
sizes.
Jens
-----Ursprüngliche Nachricht-----
From: Will Pierce
Sent: Thursday, March 14, 2013 10:47 PM
To: user@thrift.apache.org
Subject: Re: updating to longer ints
Definitely not an easy problem. On the wire these will both encode as the
same field ID (1) and a type ID of 6 for the i16 or a type ID of 8 for the
i32. Most (all?) thrift language bindings check to ensure the type ID for
a field matches the expected type ID for that particular field, silently
skipping past any field with mismatched type IDs.
George Chung's suggestion is the safest/cleanest approach.
I had exactly this same problem where I needed to expand from i32 to i64.
It was a "flag day" cutover for me, since I had control of all the clients
and servers. But I can see how you frequently can't upgrade all the client
code out in the wild. (Or data files, if these are persisted.)
Rather than try to solve the broader problem of versioning, maybe we should
consider adding optional behavior to the decoder to permit
*type-promotion*when the field's on-wire type can be safely decoded
into the expected type?
Provided that the expected type can faithfully represent all the values of
the wire type, it seems like a way to follow the "TCP" pattern of being
permissive in what you receive, but strict in what you send.
In terms of thrift TTypes, we could allow a decoder to promote:
bool > i08 > i16 > i32 > (double or i64)
This wouldn't solve every problem - like sending a response struct with an
i32 field to a client than expects i16. No go there. This would be only
useful in cases of having to read client input from a mixed bag of old and
new clients, where the server response struct doesn't need any wider
fields. I'm not sure if that is your use-case...
Those using thrift for serialization to/from disk, I would imagine
type-promotion could save a lot of effort in re-encoding any old files that
use narrower numeric types than newer files...
Thoughts?
Re: updating to longer ints
Posted by Will Pierce <wi...@nuclei.com>.
Definitely not an easy problem. On the wire these will both encode as the
same field ID (1) and a type ID of 6 for the i16 or a type ID of 8 for the
i32. Most (all?) thrift language bindings check to ensure the type ID for
a field matches the expected type ID for that particular field, silently
skipping past any field with mismatched type IDs.
George Chung's suggestion is the safest/cleanest approach.
I had exactly this same problem where I needed to expand from i32 to i64.
It was a "flag day" cutover for me, since I had control of all the clients
and servers. But I can see how you frequently can't upgrade all the client
code out in the wild. (Or data files, if these are persisted.)
Rather than try to solve the broader problem of versioning, maybe we should
consider adding optional behavior to the decoder to permit
*type-promotion*when the field's on-wire type can be safely decoded
into the expected type?
Provided that the expected type can faithfully represent all the values of
the wire type, it seems like a way to follow the "TCP" pattern of being
permissive in what you receive, but strict in what you send.
In terms of thrift TTypes, we could allow a decoder to promote:
bool > i08 > i16 > i32 > (double or i64)
This wouldn't solve every problem - like sending a response struct with an
i32 field to a client than expects i16. No go there. This would be only
useful in cases of having to read client input from a mixed bag of old and
new clients, where the server response struct doesn't need any wider
fields. I'm not sure if that is your use-case...
Those using thrift for serialization to/from disk, I would imagine
type-promotion could save a lot of effort in re-encoding any old files that
use narrower numeric types than newer files...
Thoughts?