You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Benjy Weinberger (Created) (JIRA)" <ji...@apache.org> on 2011/12/31 09:38:30 UTC

[jira] [Created] (THRIFT-1477) Allow readFieldBegin() to pass back the field name instead of the field id

Allow readFieldBegin() to pass back the field name instead of the field id
--------------------------------------------------------------------------

                 Key: THRIFT-1477
                 URL: https://issues.apache.org/jira/browse/THRIFT-1477
             Project: Thrift
          Issue Type: Improvement
          Components: Java - Compiler
            Reporter: Benjy Weinberger
            Priority: Minor


[Apologies if this has been addressed in another issue. I couldn't find anything relevant on JIRA or the mailing list archives.]

Background: I'm implementing a BSON protocol, in order to write Thrift messages to MongoDB (technically the protocol generates the object representation that the MongoDB driver expects, not a raw BSON string directly to the transport, but that's an unimportant detail here). 

BSON, like JSON, naturally uses human-readable string field names. 

When reading, the generated Thrift code (at least in Java) requires that readFieldBegin() pass back a TField with the id field set. It ignores the name field. Therefore the ids must appear in the stream. It's possible to contort these protocols to use ids instead of human-readable names (as TJSONProtocol does) but this isn't helpful in dealing with prior BSON or JSON data that we're trying to back-port into Thrift schemata.

However, the generated read() method already knows how to map names to ids. So I propose allowing a TProtocol's readFieldBegin() method to pass back a TField with the name set and no id set (indicated, say, by id==-1), and let the read() method figure out the id to then switch on. 

In some cases we could also allow the TField to omit the type information, which, again, is not naturally present in JSON. (BSON does embed type information, but its type system does not align fully with Thrift's, so it can't be used without further context). If the field is unknown, the only use for the type is for skipping the field value. But protocols like JSON and BSON can skip fields without this type information, since fields are delimited in the protocol in a type-independent way.

Basically, what I propose is that readFieldBegin() be allowed to pass back just an id or just a name (and, for some protocols, no type information), since that is all read() needs in order to figure out how to read or skip the field. 

I'm wondering what the Thrift elders think of this. Has it been discussed? Thanks!


PS This does have the downside that if Thrift were to implement a pass-through feature for unrecognized fields (so that new messages read with old protocol versions will serialize back out with no loss) - we wouldn't be able to preserve fields for which we only had a name and no id. Or rather, we wouldn't be able to write them out to a protocol that requires ids, like the binary protocols. However this feature doesn't exist anyway, and I don't know if it's on the roadmap. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (THRIFT-1477) Allow readFieldBegin() to pass back the field name instead of the field id

Posted by Bryan Duxbury <br...@rapleaf.com>.
If you're working on 0.8+, you could always just create a new Scheme to go
with your Protocol that never uses IDs and always uses names. See the
TupleProtocol for an example of using a non-standard scheme.

On Tue, Jan 3, 2012 at 9:53 AM, Benjy Weinberger <be...@gmail.com> wrote:

> Sadly no. I have tried this extensively, but if read() can't accept
> this information directly then the TProtocol implementation has to
> receive it out-of-band somehow. It seemed like it would be possible to
> generate it up front from the available FieldValueMetaData - starting
> at the root class and working your way down. Unfortunately this:
>
> A) greatly complicates the TProtocol implementation
>
> and
>
> B) is defeated by the presence of typedefs, since, unfortunately and
> for reasons I'm not clear on, the FieldValueMetaData for a typedef
> describes the typedef, not the thing it refers to. (*)
>
>
> There is a trick using stack traces (see (**) for details), but it:
>
> A) greatly complicates the TProtocol implementation
>
> B) is fragile
>
> C) performs poorly and is unsuitable for production use
>
> D) is defeated by the presence of unions.
>
>
> My conclusion, and believe me this was after several days of
> deep-diving into the issue, is that it is much simpler and more
> natural to fix up the read() method.
>
> It's simpler because it isn't a particularly big change. And it's
> probably better to add complication to generated code than to
> hand-written code.
>
> It's more natural because the read() method already has the
> information it needs to read a field by name. Forcing the TProtocol to
> do the conversion means passing it out-of-band information that
> ultimately has to come from the generated struct code anyway. And the
> TProtocol should be about wire representation, not about id<->field
> mappings. That seems properly the role of the generated code.
>
> What do you think? I could put together a change to the compiler and
> we can see the degree of complexity it adds to the generated code is a
> problem.
>
> Benjy
>
>
>
>
> (*) This typedef metadata doesn't seem to be useful at runtime, since
> it just states the typedef alias, but this doesn't refer to anything
> you can reflect on at runtime.
>
> (**) FWIW there is a "TTextProtocol" in the twitter/commons libraries on
> github. It is essentially a JSON protocol that uses names.
>
> However because of the problem mentioned below it has to do some
> non-trivial state tracking, including the following trick to figure
> out which struct it's currently reading (so it knows which struct's
> metadata to use to map names to ids):
>
> In readStructBegin() it generates a stack trace and walks up it frame
> by frame, calling Class.forName() on the declaring class of each frame
> and tests it for assignability to TBase. The first TBase subclass
> encountered is assumed to be the currently read struct - calling
> readStructBegin() from its read() method.
>
> This is fragile (e.g., it doesn't work with unions because the first
> TBase subclass encountered will be TUnion) and has poor performance,
> so it's only used in tests.
>
>
> On Tue, Jan 3, 2012 at 9:40 AM, Bryan Duxbury (Commented) (JIRA)
> <ji...@apache.org> wrote:
> >
> >    [
> https://issues.apache.org/jira/browse/THRIFT-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178850#comment-13178850]
> >
> > Bryan Duxbury commented on THRIFT-1477:
> > ---------------------------------------
> >
> > This feels like it would unnecessarily complicate the read() method. Can
> you just figure out a way to do the name->id mapping yourself?
> >
> >> Allow readFieldBegin() to pass back the field name instead of the field
> id
> >>
> --------------------------------------------------------------------------
> >>
> >>                 Key: THRIFT-1477
> >>                 URL: https://issues.apache.org/jira/browse/THRIFT-1477
> >>             Project: Thrift
> >>          Issue Type: Improvement
> >>          Components: Java - Compiler
> >>            Reporter: Benjy Weinberger
> >>            Priority: Minor
> >>
> >> [Apologies if this has been addressed in another issue. I couldn't find
> anything relevant on JIRA or the mailing list archives.]
> >> Background: I'm implementing a BSON protocol, in order to write Thrift
> messages to MongoDB (technically the protocol generates the object
> representation that the MongoDB driver expects, not a raw BSON string
> directly to the transport, but that's an unimportant detail here).
> >> BSON, like JSON, naturally uses human-readable string field names.
> >> When reading, the generated Thrift code (at least in Java) requires
> that readFieldBegin() pass back a TField with the id field set. It ignores
> the name field. Therefore the ids must appear in the stream. It's possible
> to contort these protocols to use ids instead of human-readable names (as
> TJSONProtocol does) but this isn't helpful in dealing with prior BSON or
> JSON data that we're trying to back-port into Thrift schemata.
> >> However, the generated read() method already knows how to map names to
> ids. So I propose allowing a TProtocol's readFieldBegin() method to pass
> back a TField with the name set and no id set (indicated, say, by id==-1),
> and let the read() method figure out the id to then switch on.
> >> In some cases we could also allow the TField to omit the type
> information, which, again, is not naturally present in JSON. (BSON does
> embed type information, but its type system does not align fully with
> Thrift's, so it can't be used without further context). If the field is
> unknown, the only use for the type is for skipping the field value. But
> protocols like JSON and BSON can skip fields without this type information,
> since fields are delimited in the protocol in a type-independent way.
> >> Basically, what I propose is that readFieldBegin() be allowed to pass
> back just an id or just a name (and, for some protocols, no type
> information), since that is all read() needs in order to figure out how to
> read or skip the field.
> >> I'm wondering what the Thrift elders think of this. Has it been
> discussed? Thanks!
> >> PS This does have the downside that if Thrift were to implement a
> pass-through feature for unrecognized fields (so that new messages read
> with old protocol versions will serialize back out with no loss) - we
> wouldn't be able to preserve fields for which we only had a name and no id.
> Or rather, we wouldn't be able to write them out to a protocol that
> requires ids, like the binary protocols. However this feature doesn't exist
> anyway, and I don't know if it's on the roadmap.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
>

Re: [jira] [Commented] (THRIFT-1477) Allow readFieldBegin() to pass back the field name instead of the field id

Posted by Benjy Weinberger <be...@gmail.com>.
Sadly no. I have tried this extensively, but if read() can't accept
this information directly then the TProtocol implementation has to
receive it out-of-band somehow. It seemed like it would be possible to
generate it up front from the available FieldValueMetaData - starting
at the root class and working your way down. Unfortunately this:

A) greatly complicates the TProtocol implementation

and

B) is defeated by the presence of typedefs, since, unfortunately and
for reasons I'm not clear on, the FieldValueMetaData for a typedef
describes the typedef, not the thing it refers to. (*)


There is a trick using stack traces (see (**) for details), but it:

A) greatly complicates the TProtocol implementation

B) is fragile

C) performs poorly and is unsuitable for production use

D) is defeated by the presence of unions.


My conclusion, and believe me this was after several days of
deep-diving into the issue, is that it is much simpler and more
natural to fix up the read() method.

It's simpler because it isn't a particularly big change. And it's
probably better to add complication to generated code than to
hand-written code.

It's more natural because the read() method already has the
information it needs to read a field by name. Forcing the TProtocol to
do the conversion means passing it out-of-band information that
ultimately has to come from the generated struct code anyway. And the
TProtocol should be about wire representation, not about id<->field
mappings. That seems properly the role of the generated code.

What do you think? I could put together a change to the compiler and
we can see the degree of complexity it adds to the generated code is a
problem.

Benjy




(*) This typedef metadata doesn't seem to be useful at runtime, since
it just states the typedef alias, but this doesn't refer to anything
you can reflect on at runtime.

(**) FWIW there is a "TTextProtocol" in the twitter/commons libraries on
github. It is essentially a JSON protocol that uses names.

However because of the problem mentioned below it has to do some
non-trivial state tracking, including the following trick to figure
out which struct it's currently reading (so it knows which struct's
metadata to use to map names to ids):

In readStructBegin() it generates a stack trace and walks up it frame
by frame, calling Class.forName() on the declaring class of each frame
and tests it for assignability to TBase. The first TBase subclass
encountered is assumed to be the currently read struct - calling
readStructBegin() from its read() method.

This is fragile (e.g., it doesn't work with unions because the first
TBase subclass encountered will be TUnion) and has poor performance,
so it's only used in tests.


On Tue, Jan 3, 2012 at 9:40 AM, Bryan Duxbury (Commented) (JIRA)
<ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/THRIFT-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178850#comment-13178850 ]
>
> Bryan Duxbury commented on THRIFT-1477:
> ---------------------------------------
>
> This feels like it would unnecessarily complicate the read() method. Can you just figure out a way to do the name->id mapping yourself?
>
>> Allow readFieldBegin() to pass back the field name instead of the field id
>> --------------------------------------------------------------------------
>>
>>                 Key: THRIFT-1477
>>                 URL: https://issues.apache.org/jira/browse/THRIFT-1477
>>             Project: Thrift
>>          Issue Type: Improvement
>>          Components: Java - Compiler
>>            Reporter: Benjy Weinberger
>>            Priority: Minor
>>
>> [Apologies if this has been addressed in another issue. I couldn't find anything relevant on JIRA or the mailing list archives.]
>> Background: I'm implementing a BSON protocol, in order to write Thrift messages to MongoDB (technically the protocol generates the object representation that the MongoDB driver expects, not a raw BSON string directly to the transport, but that's an unimportant detail here).
>> BSON, like JSON, naturally uses human-readable string field names.
>> When reading, the generated Thrift code (at least in Java) requires that readFieldBegin() pass back a TField with the id field set. It ignores the name field. Therefore the ids must appear in the stream. It's possible to contort these protocols to use ids instead of human-readable names (as TJSONProtocol does) but this isn't helpful in dealing with prior BSON or JSON data that we're trying to back-port into Thrift schemata.
>> However, the generated read() method already knows how to map names to ids. So I propose allowing a TProtocol's readFieldBegin() method to pass back a TField with the name set and no id set (indicated, say, by id==-1), and let the read() method figure out the id to then switch on.
>> In some cases we could also allow the TField to omit the type information, which, again, is not naturally present in JSON. (BSON does embed type information, but its type system does not align fully with Thrift's, so it can't be used without further context). If the field is unknown, the only use for the type is for skipping the field value. But protocols like JSON and BSON can skip fields without this type information, since fields are delimited in the protocol in a type-independent way.
>> Basically, what I propose is that readFieldBegin() be allowed to pass back just an id or just a name (and, for some protocols, no type information), since that is all read() needs in order to figure out how to read or skip the field.
>> I'm wondering what the Thrift elders think of this. Has it been discussed? Thanks!
>> PS This does have the downside that if Thrift were to implement a pass-through feature for unrecognized fields (so that new messages read with old protocol versions will serialize back out with no loss) - we wouldn't be able to preserve fields for which we only had a name and no id. Or rather, we wouldn't be able to write them out to a protocol that requires ids, like the binary protocols. However this feature doesn't exist anyway, and I don't know if it's on the roadmap.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>

[jira] [Commented] (THRIFT-1477) Allow readFieldBegin() to pass back the field name instead of the field id

Posted by "Bryan Duxbury (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178850#comment-13178850 ] 

Bryan Duxbury commented on THRIFT-1477:
---------------------------------------

This feels like it would unnecessarily complicate the read() method. Can you just figure out a way to do the name->id mapping yourself? 
                
> Allow readFieldBegin() to pass back the field name instead of the field id
> --------------------------------------------------------------------------
>
>                 Key: THRIFT-1477
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1477
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Java - Compiler
>            Reporter: Benjy Weinberger
>            Priority: Minor
>
> [Apologies if this has been addressed in another issue. I couldn't find anything relevant on JIRA or the mailing list archives.]
> Background: I'm implementing a BSON protocol, in order to write Thrift messages to MongoDB (technically the protocol generates the object representation that the MongoDB driver expects, not a raw BSON string directly to the transport, but that's an unimportant detail here). 
> BSON, like JSON, naturally uses human-readable string field names. 
> When reading, the generated Thrift code (at least in Java) requires that readFieldBegin() pass back a TField with the id field set. It ignores the name field. Therefore the ids must appear in the stream. It's possible to contort these protocols to use ids instead of human-readable names (as TJSONProtocol does) but this isn't helpful in dealing with prior BSON or JSON data that we're trying to back-port into Thrift schemata.
> However, the generated read() method already knows how to map names to ids. So I propose allowing a TProtocol's readFieldBegin() method to pass back a TField with the name set and no id set (indicated, say, by id==-1), and let the read() method figure out the id to then switch on. 
> In some cases we could also allow the TField to omit the type information, which, again, is not naturally present in JSON. (BSON does embed type information, but its type system does not align fully with Thrift's, so it can't be used without further context). If the field is unknown, the only use for the type is for skipping the field value. But protocols like JSON and BSON can skip fields without this type information, since fields are delimited in the protocol in a type-independent way.
> Basically, what I propose is that readFieldBegin() be allowed to pass back just an id or just a name (and, for some protocols, no type information), since that is all read() needs in order to figure out how to read or skip the field. 
> I'm wondering what the Thrift elders think of this. Has it been discussed? Thanks!
> PS This does have the downside that if Thrift were to implement a pass-through feature for unrecognized fields (so that new messages read with old protocol versions will serialize back out with no loss) - we wouldn't be able to preserve fields for which we only had a name and no id. Or rather, we wouldn't be able to write them out to a protocol that requires ids, like the binary protocols. However this feature doesn't exist anyway, and I don't know if it's on the roadmap. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (THRIFT-1477) Allow readFieldBegin() to pass back the field name instead of the field id

Posted by "Dave Watson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178833#comment-13178833 ] 

Dave Watson commented on THRIFT-1477:
-------------------------------------

Like this idea.

Would love a way to have TJsonProtocol be able to read/write names instead of field ID's.  Proposal sounds great to me.

I don't see your downside as a huge use-case.  
                
> Allow readFieldBegin() to pass back the field name instead of the field id
> --------------------------------------------------------------------------
>
>                 Key: THRIFT-1477
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1477
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Java - Compiler
>            Reporter: Benjy Weinberger
>            Priority: Minor
>
> [Apologies if this has been addressed in another issue. I couldn't find anything relevant on JIRA or the mailing list archives.]
> Background: I'm implementing a BSON protocol, in order to write Thrift messages to MongoDB (technically the protocol generates the object representation that the MongoDB driver expects, not a raw BSON string directly to the transport, but that's an unimportant detail here). 
> BSON, like JSON, naturally uses human-readable string field names. 
> When reading, the generated Thrift code (at least in Java) requires that readFieldBegin() pass back a TField with the id field set. It ignores the name field. Therefore the ids must appear in the stream. It's possible to contort these protocols to use ids instead of human-readable names (as TJSONProtocol does) but this isn't helpful in dealing with prior BSON or JSON data that we're trying to back-port into Thrift schemata.
> However, the generated read() method already knows how to map names to ids. So I propose allowing a TProtocol's readFieldBegin() method to pass back a TField with the name set and no id set (indicated, say, by id==-1), and let the read() method figure out the id to then switch on. 
> In some cases we could also allow the TField to omit the type information, which, again, is not naturally present in JSON. (BSON does embed type information, but its type system does not align fully with Thrift's, so it can't be used without further context). If the field is unknown, the only use for the type is for skipping the field value. But protocols like JSON and BSON can skip fields without this type information, since fields are delimited in the protocol in a type-independent way.
> Basically, what I propose is that readFieldBegin() be allowed to pass back just an id or just a name (and, for some protocols, no type information), since that is all read() needs in order to figure out how to read or skip the field. 
> I'm wondering what the Thrift elders think of this. Has it been discussed? Thanks!
> PS This does have the downside that if Thrift were to implement a pass-through feature for unrecognized fields (so that new messages read with old protocol versions will serialize back out with no loss) - we wouldn't be able to preserve fields for which we only had a name and no id. Or rather, we wouldn't be able to write them out to a protocol that requires ids, like the binary protocols. However this feature doesn't exist anyway, and I don't know if it's on the roadmap. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira