You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by Chet Murthy <mu...@gmail.com> on 2017/10/17 18:59:59 UTC

"iterated container types" and a nicer JSON wire protocol (2 of 2)

[A second note to follow up on the description of my questions around a new
JSON wire protocol.]

The most .... "intractable" problem with a nicer wire protocol seems to be
in dealing with iterated containers, and specifically with their types.
Consider a message like:

struct Foo {
  1: required list<list<i32>> l ;
}

the generated read() code for member "l" looks like this:

{
  uint32_t _size76;
  ::apache::thrift::protocol::TType _etype79;
  xfer += iprot->readListBegin(_etype79, _size76);
  uint32_t _i80;
  for (_i80 = 0; _i80 < _size76; ++_i80)
    {
      {
uint32_t _size81;
::apache::thrift::protocol::TType _etype84;
xfer += iprot->readListBegin(_etype84, _size81);
uint32_t _i85;
for (_i85 = 0; _i85 < _size81; ++_i85)
  {
    ...
  }
xfer += iprot->readListEnd();
      }
    }
  xfer += iprot->readListEnd();
}

In this code, readListBegin() is supposed to return the size of the list,
and the type of elements.  The size gets used, but the type does not (after
all, the generated code knows the type) get used HERE.  BUT IT IS USED in
the generic skip() template functions.  So:

(1) morally, it seems like readListBegin() MUST return a correct type

(2) in fact, both binary & compact protocols do this -- further evidence
that this is a contract that protocols should fulfil.

But in any "nice" JSON protocol, it's going to be complicated to return
that type.  Consider a possible serialization of an instance of that struct:

{ "l": [ [1,2,3], [4,5,6] ] }

demarshalling this could proceed as follows:

(0) deserialize into a JSON DOM so we can find the sizes of the arrays above

(1) then the calls:

readStructBegin()
readFieldBegin() // to read the "l" and ":", return field-id, field-type
(1, T_LIST)
readListBegin() // read first "[", return size, elem-type (2, T_LIST)
readListBegin() // read second "[" return size, elem-type 3, (T_I32)
readI32() // read "1"
. etc ....

(2) readFieldBegin() can use lookaside data to map the string "l" to <1,
T_LIST>
(3) with some trickery, the first readListBegin() could do the same (and
since we deserialized into a JSON DOM already, computing size is easy)
(4) BUT without keeping full recursive tree type-data-structures around at
runtime, I don't see how the second readListBegin() can be
properly/correctly implemented.

And this is a simple case.  If we consider something like map<string,
map<string, list<i32> > >, it seems pretty clear that the JSON protocol
module would have to be stepping thru a tree-structured state-machine in
sync with the generated read() method.

And that doesn't seem like a recipe for maintainability.

I do think there's another way to solve this problem, but I don't want to
address it until I've already closed-off ths particular avenue of
investigation.

Any comments/advice welcome.

Thanks,
--chet--