You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by Chet Murthy <mu...@gmail.com> on 2017/10/17 18:37:21 UTC

"iterated container types" and a nicer JSON wire protocol (1 of 2)

Folks,

[at the suggestion of a couple of committers, I'm sending this to the dev@
mailing-list.  It's a little involved, and I'll write a second note, with
even more detail.  Please skip if this is too much in the weeds, and with
my apologies for the details.  Really am looking for advice here ...]

TL;DR -- long note with details of issues I ran into.  Really, I'm looking
for whether this is hopeless and I should stop (which would be sad, b/c
things work for all but some special cases (explained below)).

Hey, I'm hacking away, implementing "nicer JSON serialization" (and the
metadata support for it) and have run into some issues.  I'm not sure if
the right way to ask about these is on the mailing-list (I guess,
"thrift-dev"?) or directly emailing you.  If it'd be better to just post to
the mailing-list, let me know, and I'll do so?

In any case, the first level of adding nicer JSON serialization was
smooth.  I'ts straightforward to make it work for structs.  But for
list/map/set, it's messier, and perhaps impossible.

There are two issues, one of which is mitigable, and the other perhaps
insurmountable given the current design of Thrift.  I thought I'd list
them, and if you could give your advice/judgment, I'd appreciate it greatly.

(1) Thrift assumes that containers serialize by writing (a) their SIZE and
(b) their ELEMENT TYPES.  Obviously if we want a pretty
(human-{writable,reable}) JSON format, that's a non-starter.  Nobody's
going to count map-entries when they're updating JSON documents.  Ditto
writing down element types.

  --> the (a) size issue is mitigable by using a JSON parser to parse the
entire document, and then the deserializer would walk the "DOM tree".  So
when readListBegin() is called, we can compute the length of the list.

  --> For (b), we can also keep track of the expected type of a field when
we start deserializing it, so that readListBegin() can return the
element-type.  And similarly for map/set.

BUT (2) the part just above only works when containers are -directly- the
types of fields.  It's possible in Thrift to have "iterated containers",
e.g.

  8: required list<list<string>> h,
  9: required list<set<i32>> i,
  10: required map<string, set<i32>> j,

I can't determine how "officially supported" these usages are, but it seems
pretty infeasible to imagine a way to both

 (i) honor the TProtocol contract to its invoking code (typically generated)
 (ii) produce a "pretty" JSON serialization format for these types.

Now, if it were possible to forbid "iterated containers", I think
everything could work out.  I'd produce two "nicer JSON serializers":

(a) with Thrift, a version that still has to have the "size" of containers,
but no type information and field-names instead of field-IDs

(b) as a contrib, a version that doesn't have to have the size of
containers, but users a full JSON parser to build a DOM before
deserializing.

OK, long note.  Maybe I should be sending this to thrift-dev ?

In any case, thanks for your advice on all this.

Cheers,
--chet--