You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Randy Abernethy <ra...@apache.org> on 2017/10/02 16:41:37 UTC

Re: Human-readable wire-format for Thrift?

Hi Chet,

You say there is no mapping between the field names and type/ids, yet every
struct (including param structs) hands just such data to the proto on
write. Why are the field string names supplied to the
TProtocol::writeFieldBegin method by the generated struct code
insufficient? The write code passes the proto the field name, type and id;
and the read is offered the opportunity to return them. Sounds like
everything your new protocol would need is supplied. As per Jens you just
need to serialize the data provided the way you want it (swapping field
names for ids).

What am I missing (I'm guessing something :-)?

For example, thrift IDL (notice bold items):

>>>>>>>>>>>>

struct data {
    1: i16 *f1*
    2: i16 *f2*
}

<<<<<<<<<<<<<

generates c++ write:

>>>>>>>>>>

uint32_t data::write(::apache::thrift::protocol::TProtocol* oprot) const {
  uint32_t xfer = 0;
  ::apache::thrift::protocol::TOutputRecursionTracker tracker(*oprot);
  xfer += oprot->writeStructBegin("data");

  xfer += oprot->writeFieldBegin("*f1*", ::apache::thrift::protocol::T_I16,
1);
  xfer += oprot->writeI16(this->f1);
  xfer += oprot->writeFieldEnd();

  xfer += oprot->writeFieldBegin("*f2*", ::apache::thrift::protocol::T_I16,
2);
  xfer += oprot->writeI16(this->f2);
  xfer += oprot->writeFieldEnd();

  xfer += oprot->writeFieldStop();
  xfer += oprot->writeStructEnd();
  return xfer;
}

<<<<<<<<<<<<<

and read:

>>>>>>>>>>>>>

uint32_t data::read(::apache::thrift::protocol::TProtocol* iprot) {

  ::apache::thrift::protocol::TInputRecursionTracker tracker(*iprot);
  uint32_t xfer = 0;
  std::string fname;
  ::apache::thrift::protocol::TType ftype;
  int16_t fid;

  xfer += iprot->readStructBegin(fname);

  using ::apache::thrift::protocol::TProtocolException;

  while (true)
  {
    xfer += iprot->readFieldBegin(*fname*, ftype, fid);
    if (ftype == ::apache::thrift::protocol::T_STOP) {
      break;
    }
    switch (fid)
    {
      case 1:
        if (ftype == ::apache::thrift::protocol::T_I16) {
          xfer += iprot->readI16(this->f1);
          this->__isset.f1 = true;
        } else {
          xfer += iprot->skip(ftype);
        }
        break;
      case 2:
        if (ftype == ::apache::thrift::protocol::T_I16) {
          xfer += iprot->readI16(this->f2);
          this->__isset.f2 = true;
        } else {
          xfer += iprot->skip(ftype);
        }
        break;
      default:
        xfer += iprot->skip(ftype);
        break;
    }
    xfer += iprot->readFieldEnd();
  }

  xfer += iprot->readStructEnd();

  return xfer;
}

<<<<<<<<<<<<<


--Randy


On Sat, Sep 30, 2017 at 6:38 AM, Edward Capriolo <ed...@gmail.com>
wrote:

> Also i wonder if what is meant by human readable is simple a clever way to
> generate pcap modules so tools like wireshark/tcp dump can read the data.
>
>
>
> On Thu, Sep 28, 2017 at 3:49 PM, Jens Geyer <je...@hotmail.com> wrote:
>
> > Hi Chet,
> >
> > well, Thrift is primarily about efficiency, not human readability. If
> > machines and programs talk to each other, nobody really needs human
> > readable
> > messages, because there are no humans involved, except maybe for
> debugging
> > (but that's not a real production use case).  If one asked you to pick
> just
> > one single feature about any Serialization and RPC library, potentially
> > sacrificing any other requirement if needed, you probably would answer
> that
> > it should be as fast and efficient as possible.
> >
> > I only wonder if the human readability has sth to do with the fact that
> > gRPC
> > is often found being slower than Thrift ...  ;-)
> >
> > You still want a human readable fomat? Ok, here's how to do it. Thrift
> > indeed offers the ability to achieve that, because it is a framework. For
> > example, look at the implementation of the TSimpleJSONProtocol (link
> below)
> > and use this as a starting point to write your own JSON-like TProtocol
> > implementation that suits your needs. That's what makes Thrift so
> flexible
> > -
> > even if you have special needs, you need to replace only those parts and
> it
> > still simply works. If you prefer XML or some other format, even that
> > should
> > be feasible, but you have to invest some work either way.
> >
> > https://github.com/apache/thrift/blob/master/lib/java/
> > src/org/apache/thrift/protocol/TSimpleJSONProtocol.java
> >
> > Does that help you?
> >
> > Have fun,
> > JensG
> >
> >
> > -----Ursprüngliche Nachricht-----
> > From: Chet Murthy
> > Sent: Thursday, September 28, 2017 3:04 AM
> > To: user@thrift.apache.org
> > Subject: Human-readable wire-format for Thrift?
> >
> > [I hope I'm sending this mail to the right list -- it wasn't clear to me
> > that it should go to thrift-dev, so I figured I'd send it here first.]
> >
> > The -one- thing that protobufs has going for it, over Thrift, is that
> > protobufs has "CompactTextFormat" (and JSON too) as full wire-formats.
> > This is .... incredibly useful for the following use-case:
> >
> > You want to write a config-file format, and you want to get the benefits
> of
> > version-to-version compatibility.  In your program, you'd like to access
> a
> > strongly-typed "config object" with typed fields, and you'd -like- for
> > marshalling to/from flat-text to be automatically generated.
> >
> > I have personal experience with using protobufs in exactly this way, and
> > it's really, really, really nice.
> >
> > The current Thrift JSON protocol isn't designed for this, and given the
> > interface of the (C++) TProtocol class, I think it isn't possible.  But
> > with a small change, it -would- be possible, so I thought I'd describe
> the
> > change, and see what you all thought (b/c it would require a change to
> > generated code, and to the TProtocol base class interfaces (specifically
> to
> > the readFieldBegin method):
> >
> > [I'll describe this for the C++ generated code; I haven't looked
> carefully
> > into the rest of the languages, but I'd guess that something could be
> > done.]
> >
> > (0) Let me first note that these datastructures are constant, and we're
> > talking about passing an extra parameter to the read method listed above.
> > That's it.
> >
> > (1) For concreteness, imagine a couple of message types
> >
> > struct Bar {
> >   4: required i32 a ,
> >   5: required string b,
> > }
> >
> > struct Foo {
> >   1: required i32 a ,
> >   2: required string b,
> >   3: required Bar c,
> > }
> >
> > Again for concreteness, here's an example of the JSON protoocol for a
> value
> > of type Foo:
> >
> > {
> >     "1": {
> >         "i32": 1
> >     },
> >     "2": {
> >         "str": "ugh"
> >     },
> >     "3": {
> >         "rec": {
> >             "4": {
> >                 "i32": 2
> >             },
> >             "5": {
> >                 "str": "argh"
> >             }
> >         }
> >     }
> > }
> >
> > (2) I'd prefer that that look like:
> > {
> >     "a": 1,
> >     "b": "ugh",
> >     "c": {
> >          "a": 2,
> >           "b": "argh"
> >     }
> > }
> >
> > (3) For each message-type, we need a mapping field-name ->
> > pair<Thrift-type, field-id>.  So, generate a constant data-structure of
> > type
> >
> > map<string, pair<Type, int16_t> >
> >
> > for each message-type.
> >
> > (3) Marshalling is easy -- all the field-names are known, and we could
> just
> > emit those instead of field-ids; similarly, we could skip putting
> > type-information in the wire-format too.
> >
> > (4) At demarshalling time, we always know the type of the message we're
> > demarshalling.  So as we read field-names, we can use the map in #3 to
> look
> > up TType and field-id, and then just demarshal in the normal way.  We
> just
> > need to pass that map as a constref to readFieldBegin.
> >
> > I -think- that that works, and can't find any problems with what I've
> > described.
> >
> > I can make this change to the C++ library and code-generator, but before
> I
> > start down that path, I figured I should get some input on whether this
> is
> > something that the Thrift community (and maintainers) would accept?
> >
> > I think that a human-readable/writable wire would be immensely valuable,
> > and not just for the example of config-files.
> >
> > Your feedback appreciated,
> > --chet--
> >
> >
>

Re: Human-readable wire-format for Thrift?

Posted by Chet Murthy <mu...@gmail.com>.
Oops, I should have added, that I would of course implement
TNiceJSONProtocol (sic) and would be happy to -also- implement
TCompactTextProtocol (the analogue to protobuf CompactText) for all the
languages I address, both alone and in-concert with others.

On Mon, Oct 2, 2017 at 7:40 PM, Chet Murthy <mu...@gmail.com> wrote:

> I have sufficient familiarity/mastery of:
>
>   golang, c++, ocaml, perl, java, python
>
> that I can own doing these as a ... "demonstration" of this change.  That
> is, I can do these, as part of the proof that this is a good idea, and
> -before- you agree to accept (of course!)
>
> There's a lotta languages there, so I can't do all of them.  But I think
> it's very reasonable that before you say "yes", you see an important subset
> of languages addressed completely, and of course this means test-coverage
> -too-.
>
> Also, if the first round of languages look "ok", I'm happy to work with
> people who know the rest of the languages well, to get appropriate changes
> for all the other languages.
>
> Uh .... if one of you could let me know if this is sufficient, I'll get
> busy hacking.   Again: not looking for "we'll accept if ...".  rather,
> "we'll seriously consider the change if you do x,y,z".  That's all.
>
> Just tryin' to avoid wasting time (if it can be avoided) is all.
>
> --chet--
>
>
>
>
> On Mon, Oct 2, 2017 at 7:25 PM, Randy Abernethy <ra...@apache.org> wrote:
>
>> It is an interesting idea and I personally see the utility.
>>
>> It is a big ask of course. There are 30 language generators in the
>> Thrift IDL compiler and 28 language libraries.
>>
>> Interested to hear thoughts from other committers. I would be in
>> favor if no one was against (Jens [Jens-G] and James [@jeking3]
>> in particular) and if someone was committed to seeing the thing
>> through with passing tests. I think it would be reasonable to go
>> language by language (doesn't need to be added to everything
>> at once) but would want to see at least several of the key languages
>> updated and an actual protocol provided, otherwise we're just
>> making noise in an already very complex project.
>>
>> My 2 cents
>>
>>
>>
>>
>>
>> On Mon, Oct 2, 2017 at 6:36 PM, Chet Murthy <mu...@gmail.com>
>> wrote:
>>
>> > Randy,
>> >
>> > That's it.  Looking at your example, I'd make only one comment:
>> >
>> > I would prefer to emit data-structures that go in both directions --
>> both
>> > fieldid->fieldname, and fieldname->fieldid, to minimize the need for
>> > computation at runtime (in your example, the data fieldid->fieldname was
>> > present, but not the other way around).  For the "nice" protocols, the
>> > fieldname->fieldid mapping is what's needed (at read-time), but heck,
>> > generating them both would be good, I think.
>> >
>> > I take your point about YAML.  There are a small number of nice
>> > human-readable wirelines at this point: YAML, CompactText, JSON.  [as an
>> > Ocaml geek, S-expressoins].  It would be nice to support all of them
>> > "nicely".
>> >
>> > --chet--
>> >
>> > On Mon, Oct 2, 2017 at 5:54 PM, Randy Abernethy <ra...@apache.org> wrote:
>> >
>> > >
>> > > Proposed (cleanest/lowest impact) Fix:
>> > > 1. Modify the compiler to emit a schema for every struct in the types
>> > files
>> > > (struct types include: service method args, structs, exceptions and
>> > unions)
>> > > 2. Modify TProtocol writeStructBegin() and readStructBegin() to
>> accept a
>> > > schema arg
>> > > 3. Modify the compiler to pass the appropriate schema to all calls to
>> > > TProtocol write/readStructBegin() **
>> > >
>> > > Upshot:
>> > > 1. Existing protos would simply ignore the schema arg and new protos
>> > > (text/JSON/YAML/whatever) [de]serialize field names as desired.
>> > > 2. This would be a breaking change, though code could be brought up to
>> > date
>> > > by recompiling IDL and building (without src changes).
>> > > <<<<<
>> > >
>> > > ** I realize that only the read side needs the schema/ordinal mapping
>> but
>> > > something tells me we'll be sorry if we don't maintain the symmetry
>> > >
>> > > Did I miss anything or get anything wrong?
>> > >
>> >
>>
>
>

Re: Human-readable wire-format for Thrift?

Posted by Chet Murthy <mu...@gmail.com>.
I have sufficient familiarity/mastery of:

  golang, c++, ocaml, perl, java, python

that I can own doing these as a ... "demonstration" of this change.  That
is, I can do these, as part of the proof that this is a good idea, and
-before- you agree to accept (of course!)

There's a lotta languages there, so I can't do all of them.  But I think
it's very reasonable that before you say "yes", you see an important subset
of languages addressed completely, and of course this means test-coverage
-too-.

Also, if the first round of languages look "ok", I'm happy to work with
people who know the rest of the languages well, to get appropriate changes
for all the other languages.

Uh .... if one of you could let me know if this is sufficient, I'll get
busy hacking.   Again: not looking for "we'll accept if ...".  rather,
"we'll seriously consider the change if you do x,y,z".  That's all.

Just tryin' to avoid wasting time (if it can be avoided) is all.

--chet--




On Mon, Oct 2, 2017 at 7:25 PM, Randy Abernethy <ra...@apache.org> wrote:

> It is an interesting idea and I personally see the utility.
>
> It is a big ask of course. There are 30 language generators in the
> Thrift IDL compiler and 28 language libraries.
>
> Interested to hear thoughts from other committers. I would be in
> favor if no one was against (Jens [Jens-G] and James [@jeking3]
> in particular) and if someone was committed to seeing the thing
> through with passing tests. I think it would be reasonable to go
> language by language (doesn't need to be added to everything
> at once) but would want to see at least several of the key languages
> updated and an actual protocol provided, otherwise we're just
> making noise in an already very complex project.
>
> My 2 cents
>
>
>
>
>
> On Mon, Oct 2, 2017 at 6:36 PM, Chet Murthy <mu...@gmail.com> wrote:
>
> > Randy,
> >
> > That's it.  Looking at your example, I'd make only one comment:
> >
> > I would prefer to emit data-structures that go in both directions -- both
> > fieldid->fieldname, and fieldname->fieldid, to minimize the need for
> > computation at runtime (in your example, the data fieldid->fieldname was
> > present, but not the other way around).  For the "nice" protocols, the
> > fieldname->fieldid mapping is what's needed (at read-time), but heck,
> > generating them both would be good, I think.
> >
> > I take your point about YAML.  There are a small number of nice
> > human-readable wirelines at this point: YAML, CompactText, JSON.  [as an
> > Ocaml geek, S-expressoins].  It would be nice to support all of them
> > "nicely".
> >
> > --chet--
> >
> > On Mon, Oct 2, 2017 at 5:54 PM, Randy Abernethy <ra...@apache.org> wrote:
> >
> > >
> > > Proposed (cleanest/lowest impact) Fix:
> > > 1. Modify the compiler to emit a schema for every struct in the types
> > files
> > > (struct types include: service method args, structs, exceptions and
> > unions)
> > > 2. Modify TProtocol writeStructBegin() and readStructBegin() to accept
> a
> > > schema arg
> > > 3. Modify the compiler to pass the appropriate schema to all calls to
> > > TProtocol write/readStructBegin() **
> > >
> > > Upshot:
> > > 1. Existing protos would simply ignore the schema arg and new protos
> > > (text/JSON/YAML/whatever) [de]serialize field names as desired.
> > > 2. This would be a breaking change, though code could be brought up to
> > date
> > > by recompiling IDL and building (without src changes).
> > > <<<<<
> > >
> > > ** I realize that only the read side needs the schema/ordinal mapping
> but
> > > something tells me we'll be sorry if we don't maintain the symmetry
> > >
> > > Did I miss anything or get anything wrong?
> > >
> >
>

Re: Human-readable wire-format for Thrift?

Posted by Randy Abernethy <ra...@apache.org>.
It is an interesting idea and I personally see the utility.

It is a big ask of course. There are 30 language generators in the
Thrift IDL compiler and 28 language libraries.

Interested to hear thoughts from other committers. I would be in
favor if no one was against (Jens [Jens-G] and James [@jeking3]
in particular) and if someone was committed to seeing the thing
through with passing tests. I think it would be reasonable to go
language by language (doesn't need to be added to everything
at once) but would want to see at least several of the key languages
updated and an actual protocol provided, otherwise we're just
making noise in an already very complex project.

My 2 cents





On Mon, Oct 2, 2017 at 6:36 PM, Chet Murthy <mu...@gmail.com> wrote:

> Randy,
>
> That's it.  Looking at your example, I'd make only one comment:
>
> I would prefer to emit data-structures that go in both directions -- both
> fieldid->fieldname, and fieldname->fieldid, to minimize the need for
> computation at runtime (in your example, the data fieldid->fieldname was
> present, but not the other way around).  For the "nice" protocols, the
> fieldname->fieldid mapping is what's needed (at read-time), but heck,
> generating them both would be good, I think.
>
> I take your point about YAML.  There are a small number of nice
> human-readable wirelines at this point: YAML, CompactText, JSON.  [as an
> Ocaml geek, S-expressoins].  It would be nice to support all of them
> "nicely".
>
> --chet--
>
> On Mon, Oct 2, 2017 at 5:54 PM, Randy Abernethy <ra...@apache.org> wrote:
>
> >
> > Proposed (cleanest/lowest impact) Fix:
> > 1. Modify the compiler to emit a schema for every struct in the types
> files
> > (struct types include: service method args, structs, exceptions and
> unions)
> > 2. Modify TProtocol writeStructBegin() and readStructBegin() to accept a
> > schema arg
> > 3. Modify the compiler to pass the appropriate schema to all calls to
> > TProtocol write/readStructBegin() **
> >
> > Upshot:
> > 1. Existing protos would simply ignore the schema arg and new protos
> > (text/JSON/YAML/whatever) [de]serialize field names as desired.
> > 2. This would be a breaking change, though code could be brought up to
> date
> > by recompiling IDL and building (without src changes).
> > <<<<<
> >
> > ** I realize that only the read side needs the schema/ordinal mapping but
> > something tells me we'll be sorry if we don't maintain the symmetry
> >
> > Did I miss anything or get anything wrong?
> >
>

Re: Human-readable wire-format for Thrift?

Posted by Chet Murthy <mu...@gmail.com>.
Randy,

That's it.  Looking at your example, I'd make only one comment:

I would prefer to emit data-structures that go in both directions -- both
fieldid->fieldname, and fieldname->fieldid, to minimize the need for
computation at runtime (in your example, the data fieldid->fieldname was
present, but not the other way around).  For the "nice" protocols, the
fieldname->fieldid mapping is what's needed (at read-time), but heck,
generating them both would be good, I think.

I take your point about YAML.  There are a small number of nice
human-readable wirelines at this point: YAML, CompactText, JSON.  [as an
Ocaml geek, S-expressoins].  It would be nice to support all of them
"nicely".

--chet--

On Mon, Oct 2, 2017 at 5:54 PM, Randy Abernethy <ra...@apache.org> wrote:

>
> Proposed (cleanest/lowest impact) Fix:
> 1. Modify the compiler to emit a schema for every struct in the types files
> (struct types include: service method args, structs, exceptions and unions)
> 2. Modify TProtocol writeStructBegin() and readStructBegin() to accept a
> schema arg
> 3. Modify the compiler to pass the appropriate schema to all calls to
> TProtocol write/readStructBegin() **
>
> Upshot:
> 1. Existing protos would simply ignore the schema arg and new protos
> (text/JSON/YAML/whatever) [de]serialize field names as desired.
> 2. This would be a breaking change, though code could be brought up to date
> by recompiling IDL and building (without src changes).
> <<<<<
>
> ** I realize that only the read side needs the schema/ordinal mapping but
> something tells me we'll be sorry if we don't maintain the symmetry
>
> Did I miss anything or get anything wrong?
>

Re: Human-readable wire-format for Thrift?

Posted by Randy Abernethy <ra...@apache.org>.
Hey Chet,

I have thought on this a bit and considered some implementation
possibilities. I want
to make sure I understand your view and get your thoughts on a struct based
approach
to the problem.

Here's a summary:

>>>>>
Situation:
1. There are many uses for a simple text protocol, particularly
reading/writing configs
2. Embedding schema (or even just the ordinals) in the serialized stream is
undesirable [e.g. below could work today but is too messy]
3. Without embedded schema/ordinals there is no way for a plugin protocol
today to figure out what it is reading

Proposed (cleanest/lowest impact) Fix:
1. Modify the compiler to emit a schema for every struct in the types files
(struct types include: service method args, structs, exceptions and unions)
2. Modify TProtocol writeStructBegin() and readStructBegin() to accept a
schema arg
3. Modify the compiler to pass the appropriate schema to all calls to
TProtocol write/readStructBegin() **

Upshot:
1. Existing protos would simply ignore the schema arg and new protos
(text/JSON/YAML/whatever) [de]serialize field names as desired.
2. This would be a breaking change, though code could be brought up to date
by recompiling IDL and building (without src changes).
<<<<<

** I realize that only the read side needs the schema/ordinal mapping but
something tells me we'll be sorry if we don't maintain the symmetry

Did I miss anything or get anything wrong?

-Randy




{
    "__schema__": {
        "1": {
            "name": "a",
            "type": "i32"
        },
        "2": {
            "name": "b",
            "type": "str"
        },
        "3": {
            "name": "c",
            "type": {
                "1": {
                    "name": "a",
                    "type": "i32"
                    },
                "2": {
                    "name": "b",
                    "type": "str"
                    }
            }
        }
    },
    "a": 1,
    "b": "ugh",
    "c": {
        "a": 2,
        "b": "argh"
    }
}




On Mon, Oct 2, 2017 at 12:40 PM, Chet Murthy <mu...@gmail.com> wrote:

> Randy,
>
> [There -is- a way that this could be done without modifying the interface
> of TProtocol, but it's pretty involved/abstruse; I'll write about this in a
> second email.]
>
> I think that adding such self-describing type-information would vitiate the
> value of the format.  Let me try to convince you.
>
> TL;DR -- config-files need to be human-readable/editable, and maintaining
> what is arguably superfluous type-information in the config-file is never
> going to be appealing to humans.
>
> (1) in a former life, I worked for a company that used protobufs
> extensively for comms.  It was really really nice (for all the reasons that
> people who use Thrift a lot are aware of) -- you could count on data being
> strongly-typed, and yet the marshalling tech was efficient and had a
> modicum of version-to-version compatibility.
>
> (2) One thing that I found really lovely, was that they also expressed
> config-files in protobufs.  So almost everywhere, you found code dealing
> with configuration -objects-, and not YAML files, CSV, INI, etc.  There was
> -one- representation in-memory, and it was as protobuf objects.  This was
> great, for the same reason that storing data in protobufs was great.
>
> (3) Now, config-files are different from -data- -- Humans read and edit
> config-files.  So having a nice human-readable/editable format for
> protobufs, was critical to this use-case.  Turns out, protobufs 2.0 has
> one: "CompactText".  And protobufs 3.0 has added a new one: "JSON".  Both
> of these have the property that there's no (to the human) superfluous type
> information to keep accurately up-to-date.  In the "CompactText"
> representation, the example from my first email would look like:
>
> a: 1 b: "ugh" c: < a: 2 b: "argh" >
>
> So: a lot like JSON, but even more bare-bones.
>
> The key thing here, and in the JSON example, is that the config-file needs
> to be somewhat reasonably near to what a human would have written in the
> first place.  Adding type-information for the marshaller breaks that
> property.
>
> (4) Another use-case: where you could have an RPC server endpoint that
> presented a human-readable wireline, and users could invoke it (for
> learning, debugging, etc) using something as simple as "nc" and a
> shellscript) is also pretty much vitiated by requring the invoker to pass
> along complex type-information.
>
> --chet--
>
>
>
> On Mon, Oct 2, 2017 at 10:20 AM, Randy Abernethy <
> randy.abernethy@gmail.com>
> wrote:
>
> > I see. Are you opposed to serializing the mapping? The proto could buffer
> > writes and collect the mappings, then on writeStructEnd() you could emit
> > the map (maybe as an __map__ attribute or something) followed by the
> data.
> > The read side could read the map in response to readStructBegin(). Not
> only
> > would this require no mods to Thrift but it would have the added
> advantage
> > of making your wire format self describing. Kind of like Avro.
> >
> > Thoughts?
> >
> > On Mon, Oct 2, 2017 at 10:00 AM, Chet Murthy <mu...@gmail.com>
> > wrote:
> >
> > > Randy,
> > >
> > > Thank you for your questions!  I'm hoping that I'm mistaken, and maybe
> > via
> > > this conversation, you can help me figure out that indeed I am.
> > >
> > > (1) you're right, that the writeFieldBegin method is passed the
> > field-name,
> > > so it can write it on the JSON wire.
> > >
> > > (2) the problem is, readFieldBegin can read that back, but it cannot
> > > *infer* the fieldid from that name, and the *fieldid* is what's used in
> > > generated code to drive the switch for demarshalling.  Concretely, in
> > your
> > > example, even if "fname" were set to either "f1" or "f2", the switch
> > logic
> > > is driven by fid being set to either 1 or 2.  And there's no way for
> that
> > > to happen in a TProtocol, and specifically TSimpleJSONProtocol doesn't
> do
> > > it.  But generally, there's no way for it to happen, b/c inferring
> > fieldid
> > > from fieldname depends in which message is being demarshalled, and the
> > > *protocol* object doesn't have access to type-(IDL)-information at all.
> > >
> > > I haven't yet implemented the change I contemplate, only b/c I wanted
> to
> > > find out how open Thrift was, to such a change.  But I can do so, if
> > it'll
> > > help to explain what I mean -- it isn't difficult.
> > >
> > > --chet--
> > >
> > > On Mon, Oct 2, 2017 at 9:41 AM, Randy Abernethy <ra...@apache.org> wrote:
> > >
> > > > Hi Chet,
> > > >
> > > > You say there is no mapping between the field names and type/ids, yet
> > > every
> > > > struct (including param structs) hands just such data to the proto on
> > > > write. Why are the field string names supplied to the
> > > > TProtocol::writeFieldBegin method by the generated struct code
> > > > insufficient? The write code passes the proto the field name, type
> and
> > > id;
> > > > and the read is offered the opportunity to return them. Sounds like
> > > > everything your new protocol would need is supplied. As per Jens you
> > just
> > > > need to serialize the data provided the way you want it (swapping
> field
> > > > names for ids).
> > > >
> > > > What am I missing (I'm guessing something :-)?
> > > >
> > > > For example, thrift IDL (notice bold items):
> > > >
> > > > >>>>>>>>>>>>
> > > >
> > > > struct data {
> > > >     1: i16 *f1*
> > > >     2: i16 *f2*
> > > > }
> > > >
> > > > <<<<<<<<<<<<<
> > > >
> > > > generates c++ write:
> > > >
> > > > >>>>>>>>>>
> > > >
> > > > uint32_t data::write(::apache::thrift::protocol::TProtocol* oprot)
> > > const {
> > > >   uint32_t xfer = 0;
> > > >   ::apache::thrift::protocol::TOutputRecursionTracker
> tracker(*oprot);
> > > >   xfer += oprot->writeStructBegin("data");
> > > >
> > > >   xfer += oprot->writeFieldBegin("*f1*",
> ::apache::thrift::protocol::T_
> > > > I16,
> > > > 1);
> > > >   xfer += oprot->writeI16(this->f1);
> > > >   xfer += oprot->writeFieldEnd();
> > > >
> > > >   xfer += oprot->writeFieldBegin("*f2*",
> ::apache::thrift::protocol::T_
> > > > I16,
> > > > 2);
> > > >   xfer += oprot->writeI16(this->f2);
> > > >   xfer += oprot->writeFieldEnd();
> > > >
> > > >   xfer += oprot->writeFieldStop();
> > > >   xfer += oprot->writeStructEnd();
> > > >   return xfer;
> > > > }
> > > >
> > > > <<<<<<<<<<<<<
> > > >
> > > > and read:
> > > >
> > > > >>>>>>>>>>>>>
> > > >
> > > > uint32_t data::read(::apache::thrift::protocol::TProtocol* iprot) {
> > > >
> > > >   ::apache::thrift::protocol::TInputRecursionTracker
> tracker(*iprot);
> > > >   uint32_t xfer = 0;
> > > >   std::string fname;
> > > >   ::apache::thrift::protocol::TType ftype;
> > > >   int16_t fid;
> > > >
> > > >   xfer += iprot->readStructBegin(fname);
> > > >
> > > >   using ::apache::thrift::protocol::TProtocolException;
> > > >
> > > >   while (true)
> > > >   {
> > > >     xfer += iprot->readFieldBegin(*fname*, ftype, fid);
> > > >     if (ftype == ::apache::thrift::protocol::T_STOP) {
> > > >       break;
> > > >     }
> > > >     switch (fid)
> > > >     {
> > > >       case 1:
> > > >         if (ftype == ::apache::thrift::protocol::T_I16) {
> > > >           xfer += iprot->readI16(this->f1);
> > > >           this->__isset.f1 = true;
> > > >         } else {
> > > >           xfer += iprot->skip(ftype);
> > > >         }
> > > >         break;
> > > >       case 2:
> > > >         if (ftype == ::apache::thrift::protocol::T_I16) {
> > > >           xfer += iprot->readI16(this->f2);
> > > >           this->__isset.f2 = true;
> > > >         } else {
> > > >           xfer += iprot->skip(ftype);
> > > >         }
> > > >         break;
> > > >       default:
> > > >         xfer += iprot->skip(ftype);
> > > >         break;
> > > >     }
> > > >     xfer += iprot->readFieldEnd();
> > > >   }
> > > >
> > > >   xfer += iprot->readStructEnd();
> > > >
> > > >   return xfer;
> > > > }
> > > >
> > > > <<<<<<<<<<<<<
> > > >
> > > >
> > > > --Randy
> > > >
> > > >
> > > > On Sat, Sep 30, 2017 at 6:38 AM, Edward Capriolo <
> > edlinuxguru@gmail.com>
> > > > wrote:
> > > >
> > > > > Also i wonder if what is meant by human readable is simple a clever
> > way
> > > > to
> > > > > generate pcap modules so tools like wireshark/tcp dump can read the
> > > data.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Sep 28, 2017 at 3:49 PM, Jens Geyer <jensgeyer@hotmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Chet,
> > > > > >
> > > > > > well, Thrift is primarily about efficiency, not human
> readability.
> > If
> > > > > > machines and programs talk to each other, nobody really needs
> human
> > > > > > readable
> > > > > > messages, because there are no humans involved, except maybe for
> > > > > debugging
> > > > > > (but that's not a real production use case).  If one asked you to
> > > pick
> > > > > just
> > > > > > one single feature about any Serialization and RPC library,
> > > potentially
> > > > > > sacrificing any other requirement if needed, you probably would
> > > answer
> > > > > that
> > > > > > it should be as fast and efficient as possible.
> > > > > >
> > > > > > I only wonder if the human readability has sth to do with the
> fact
> > > that
> > > > > > gRPC
> > > > > > is often found being slower than Thrift ...  ;-)
> > > > > >
> > > > > > You still want a human readable fomat? Ok, here's how to do it.
> > > Thrift
> > > > > > indeed offers the ability to achieve that, because it is a
> > framework.
> > > > For
> > > > > > example, look at the implementation of the TSimpleJSONProtocol
> > (link
> > > > > below)
> > > > > > and use this as a starting point to write your own JSON-like
> > > TProtocol
> > > > > > implementation that suits your needs. That's what makes Thrift so
> > > > > flexible
> > > > > > -
> > > > > > even if you have special needs, you need to replace only those
> > parts
> > > > and
> > > > > it
> > > > > > still simply works. If you prefer XML or some other format, even
> > that
> > > > > > should
> > > > > > be feasible, but you have to invest some work either way.
> > > > > >
> > > > > > https://github.com/apache/thrift/blob/master/lib/java/
> > > > > > src/org/apache/thrift/protocol/TSimpleJSONProtocol.java
> > > > > >
> > > > > > Does that help you?
> > > > > >
> > > > > > Have fun,
> > > > > > JensG
> > > > > >
> > > > > >
> > > > > > -----Ursprüngliche Nachricht-----
> > > > > > From: Chet Murthy
> > > > > > Sent: Thursday, September 28, 2017 3:04 AM
> > > > > > To: user@thrift.apache.org
> > > > > > Subject: Human-readable wire-format for Thrift?
> > > > > >
> > > > > > [I hope I'm sending this mail to the right list -- it wasn't
> clear
> > to
> > > > me
> > > > > > that it should go to thrift-dev, so I figured I'd send it here
> > > first.]
> > > > > >
> > > > > > The -one- thing that protobufs has going for it, over Thrift, is
> > that
> > > > > > protobufs has "CompactTextFormat" (and JSON too) as full
> > > wire-formats.
> > > > > > This is .... incredibly useful for the following use-case:
> > > > > >
> > > > > > You want to write a config-file format, and you want to get the
> > > > benefits
> > > > > of
> > > > > > version-to-version compatibility.  In your program, you'd like to
> > > > access
> > > > > a
> > > > > > strongly-typed "config object" with typed fields, and you'd
> -like-
> > > for
> > > > > > marshalling to/from flat-text to be automatically generated.
> > > > > >
> > > > > > I have personal experience with using protobufs in exactly this
> > way,
> > > > and
> > > > > > it's really, really, really nice.
> > > > > >
> > > > > > The current Thrift JSON protocol isn't designed for this, and
> given
> > > the
> > > > > > interface of the (C++) TProtocol class, I think it isn't
> possible.
> > > But
> > > > > > with a small change, it -would- be possible, so I thought I'd
> > > describe
> > > > > the
> > > > > > change, and see what you all thought (b/c it would require a
> change
> > > to
> > > > > > generated code, and to the TProtocol base class interfaces
> > > > (specifically
> > > > > to
> > > > > > the readFieldBegin method):
> > > > > >
> > > > > > [I'll describe this for the C++ generated code; I haven't looked
> > > > > carefully
> > > > > > into the rest of the languages, but I'd guess that something
> could
> > be
> > > > > > done.]
> > > > > >
> > > > > > (0) Let me first note that these datastructures are constant, and
> > > we're
> > > > > > talking about passing an extra parameter to the read method
> listed
> > > > above.
> > > > > > That's it.
> > > > > >
> > > > > > (1) For concreteness, imagine a couple of message types
> > > > > >
> > > > > > struct Bar {
> > > > > >   4: required i32 a ,
> > > > > >   5: required string b,
> > > > > > }
> > > > > >
> > > > > > struct Foo {
> > > > > >   1: required i32 a ,
> > > > > >   2: required string b,
> > > > > >   3: required Bar c,
> > > > > > }
> > > > > >
> > > > > > Again for concreteness, here's an example of the JSON protoocol
> > for a
> > > > > value
> > > > > > of type Foo:
> > > > > >
> > > > > > {
> > > > > >     "1": {
> > > > > >         "i32": 1
> > > > > >     },
> > > > > >     "2": {
> > > > > >         "str": "ugh"
> > > > > >     },
> > > > > >     "3": {
> > > > > >         "rec": {
> > > > > >             "4": {
> > > > > >                 "i32": 2
> > > > > >             },
> > > > > >             "5": {
> > > > > >                 "str": "argh"
> > > > > >             }
> > > > > >         }
> > > > > >     }
> > > > > > }
> > > > > >
> > > > > > (2) I'd prefer that that look like:
> > > > > > {
> > > > > >     "a": 1,
> > > > > >     "b": "ugh",
> > > > > >     "c": {
> > > > > >          "a": 2,
> > > > > >           "b": "argh"
> > > > > >     }
> > > > > > }
> > > > > >
> > > > > > (3) For each message-type, we need a mapping field-name ->
> > > > > > pair<Thrift-type, field-id>.  So, generate a constant
> > data-structure
> > > of
> > > > > > type
> > > > > >
> > > > > > map<string, pair<Type, int16_t> >
> > > > > >
> > > > > > for each message-type.
> > > > > >
> > > > > > (3) Marshalling is easy -- all the field-names are known, and we
> > > could
> > > > > just
> > > > > > emit those instead of field-ids; similarly, we could skip putting
> > > > > > type-information in the wire-format too.
> > > > > >
> > > > > > (4) At demarshalling time, we always know the type of the message
> > > we're
> > > > > > demarshalling.  So as we read field-names, we can use the map in
> #3
> > > to
> > > > > look
> > > > > > up TType and field-id, and then just demarshal in the normal way.
> > We
> > > > > just
> > > > > > need to pass that map as a constref to readFieldBegin.
> > > > > >
> > > > > > I -think- that that works, and can't find any problems with what
> > I've
> > > > > > described.
> > > > > >
> > > > > > I can make this change to the C++ library and code-generator, but
> > > > before
> > > > > I
> > > > > > start down that path, I figured I should get some input on
> whether
> > > this
> > > > > is
> > > > > > something that the Thrift community (and maintainers) would
> accept?
> > > > > >
> > > > > > I think that a human-readable/writable wire would be immensely
> > > > valuable,
> > > > > > and not just for the example of config-files.
> > > > > >
> > > > > > Your feedback appreciated,
> > > > > > --chet--
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Human-readable wire-format for Thrift?

Posted by Chet Murthy <mu...@gmail.com>.
Randy,

[There -is- a way that this could be done without modifying the interface
of TProtocol, but it's pretty involved/abstruse; I'll write about this in a
second email.]

I think that adding such self-describing type-information would vitiate the
value of the format.  Let me try to convince you.

TL;DR -- config-files need to be human-readable/editable, and maintaining
what is arguably superfluous type-information in the config-file is never
going to be appealing to humans.

(1) in a former life, I worked for a company that used protobufs
extensively for comms.  It was really really nice (for all the reasons that
people who use Thrift a lot are aware of) -- you could count on data being
strongly-typed, and yet the marshalling tech was efficient and had a
modicum of version-to-version compatibility.

(2) One thing that I found really lovely, was that they also expressed
config-files in protobufs.  So almost everywhere, you found code dealing
with configuration -objects-, and not YAML files, CSV, INI, etc.  There was
-one- representation in-memory, and it was as protobuf objects.  This was
great, for the same reason that storing data in protobufs was great.

(3) Now, config-files are different from -data- -- Humans read and edit
config-files.  So having a nice human-readable/editable format for
protobufs, was critical to this use-case.  Turns out, protobufs 2.0 has
one: "CompactText".  And protobufs 3.0 has added a new one: "JSON".  Both
of these have the property that there's no (to the human) superfluous type
information to keep accurately up-to-date.  In the "CompactText"
representation, the example from my first email would look like:

a: 1 b: "ugh" c: < a: 2 b: "argh" >

So: a lot like JSON, but even more bare-bones.

The key thing here, and in the JSON example, is that the config-file needs
to be somewhat reasonably near to what a human would have written in the
first place.  Adding type-information for the marshaller breaks that
property.

(4) Another use-case: where you could have an RPC server endpoint that
presented a human-readable wireline, and users could invoke it (for
learning, debugging, etc) using something as simple as "nc" and a
shellscript) is also pretty much vitiated by requring the invoker to pass
along complex type-information.

--chet--



On Mon, Oct 2, 2017 at 10:20 AM, Randy Abernethy <ra...@gmail.com>
wrote:

> I see. Are you opposed to serializing the mapping? The proto could buffer
> writes and collect the mappings, then on writeStructEnd() you could emit
> the map (maybe as an __map__ attribute or something) followed by the data.
> The read side could read the map in response to readStructBegin(). Not only
> would this require no mods to Thrift but it would have the added advantage
> of making your wire format self describing. Kind of like Avro.
>
> Thoughts?
>
> On Mon, Oct 2, 2017 at 10:00 AM, Chet Murthy <mu...@gmail.com>
> wrote:
>
> > Randy,
> >
> > Thank you for your questions!  I'm hoping that I'm mistaken, and maybe
> via
> > this conversation, you can help me figure out that indeed I am.
> >
> > (1) you're right, that the writeFieldBegin method is passed the
> field-name,
> > so it can write it on the JSON wire.
> >
> > (2) the problem is, readFieldBegin can read that back, but it cannot
> > *infer* the fieldid from that name, and the *fieldid* is what's used in
> > generated code to drive the switch for demarshalling.  Concretely, in
> your
> > example, even if "fname" were set to either "f1" or "f2", the switch
> logic
> > is driven by fid being set to either 1 or 2.  And there's no way for that
> > to happen in a TProtocol, and specifically TSimpleJSONProtocol doesn't do
> > it.  But generally, there's no way for it to happen, b/c inferring
> fieldid
> > from fieldname depends in which message is being demarshalled, and the
> > *protocol* object doesn't have access to type-(IDL)-information at all.
> >
> > I haven't yet implemented the change I contemplate, only b/c I wanted to
> > find out how open Thrift was, to such a change.  But I can do so, if
> it'll
> > help to explain what I mean -- it isn't difficult.
> >
> > --chet--
> >
> > On Mon, Oct 2, 2017 at 9:41 AM, Randy Abernethy <ra...@apache.org> wrote:
> >
> > > Hi Chet,
> > >
> > > You say there is no mapping between the field names and type/ids, yet
> > every
> > > struct (including param structs) hands just such data to the proto on
> > > write. Why are the field string names supplied to the
> > > TProtocol::writeFieldBegin method by the generated struct code
> > > insufficient? The write code passes the proto the field name, type and
> > id;
> > > and the read is offered the opportunity to return them. Sounds like
> > > everything your new protocol would need is supplied. As per Jens you
> just
> > > need to serialize the data provided the way you want it (swapping field
> > > names for ids).
> > >
> > > What am I missing (I'm guessing something :-)?
> > >
> > > For example, thrift IDL (notice bold items):
> > >
> > > >>>>>>>>>>>>
> > >
> > > struct data {
> > >     1: i16 *f1*
> > >     2: i16 *f2*
> > > }
> > >
> > > <<<<<<<<<<<<<
> > >
> > > generates c++ write:
> > >
> > > >>>>>>>>>>
> > >
> > > uint32_t data::write(::apache::thrift::protocol::TProtocol* oprot)
> > const {
> > >   uint32_t xfer = 0;
> > >   ::apache::thrift::protocol::TOutputRecursionTracker tracker(*oprot);
> > >   xfer += oprot->writeStructBegin("data");
> > >
> > >   xfer += oprot->writeFieldBegin("*f1*", ::apache::thrift::protocol::T_
> > > I16,
> > > 1);
> > >   xfer += oprot->writeI16(this->f1);
> > >   xfer += oprot->writeFieldEnd();
> > >
> > >   xfer += oprot->writeFieldBegin("*f2*", ::apache::thrift::protocol::T_
> > > I16,
> > > 2);
> > >   xfer += oprot->writeI16(this->f2);
> > >   xfer += oprot->writeFieldEnd();
> > >
> > >   xfer += oprot->writeFieldStop();
> > >   xfer += oprot->writeStructEnd();
> > >   return xfer;
> > > }
> > >
> > > <<<<<<<<<<<<<
> > >
> > > and read:
> > >
> > > >>>>>>>>>>>>>
> > >
> > > uint32_t data::read(::apache::thrift::protocol::TProtocol* iprot) {
> > >
> > >   ::apache::thrift::protocol::TInputRecursionTracker tracker(*iprot);
> > >   uint32_t xfer = 0;
> > >   std::string fname;
> > >   ::apache::thrift::protocol::TType ftype;
> > >   int16_t fid;
> > >
> > >   xfer += iprot->readStructBegin(fname);
> > >
> > >   using ::apache::thrift::protocol::TProtocolException;
> > >
> > >   while (true)
> > >   {
> > >     xfer += iprot->readFieldBegin(*fname*, ftype, fid);
> > >     if (ftype == ::apache::thrift::protocol::T_STOP) {
> > >       break;
> > >     }
> > >     switch (fid)
> > >     {
> > >       case 1:
> > >         if (ftype == ::apache::thrift::protocol::T_I16) {
> > >           xfer += iprot->readI16(this->f1);
> > >           this->__isset.f1 = true;
> > >         } else {
> > >           xfer += iprot->skip(ftype);
> > >         }
> > >         break;
> > >       case 2:
> > >         if (ftype == ::apache::thrift::protocol::T_I16) {
> > >           xfer += iprot->readI16(this->f2);
> > >           this->__isset.f2 = true;
> > >         } else {
> > >           xfer += iprot->skip(ftype);
> > >         }
> > >         break;
> > >       default:
> > >         xfer += iprot->skip(ftype);
> > >         break;
> > >     }
> > >     xfer += iprot->readFieldEnd();
> > >   }
> > >
> > >   xfer += iprot->readStructEnd();
> > >
> > >   return xfer;
> > > }
> > >
> > > <<<<<<<<<<<<<
> > >
> > >
> > > --Randy
> > >
> > >
> > > On Sat, Sep 30, 2017 at 6:38 AM, Edward Capriolo <
> edlinuxguru@gmail.com>
> > > wrote:
> > >
> > > > Also i wonder if what is meant by human readable is simple a clever
> way
> > > to
> > > > generate pcap modules so tools like wireshark/tcp dump can read the
> > data.
> > > >
> > > >
> > > >
> > > > On Thu, Sep 28, 2017 at 3:49 PM, Jens Geyer <je...@hotmail.com>
> > > wrote:
> > > >
> > > > > Hi Chet,
> > > > >
> > > > > well, Thrift is primarily about efficiency, not human readability.
> If
> > > > > machines and programs talk to each other, nobody really needs human
> > > > > readable
> > > > > messages, because there are no humans involved, except maybe for
> > > > debugging
> > > > > (but that's not a real production use case).  If one asked you to
> > pick
> > > > just
> > > > > one single feature about any Serialization and RPC library,
> > potentially
> > > > > sacrificing any other requirement if needed, you probably would
> > answer
> > > > that
> > > > > it should be as fast and efficient as possible.
> > > > >
> > > > > I only wonder if the human readability has sth to do with the fact
> > that
> > > > > gRPC
> > > > > is often found being slower than Thrift ...  ;-)
> > > > >
> > > > > You still want a human readable fomat? Ok, here's how to do it.
> > Thrift
> > > > > indeed offers the ability to achieve that, because it is a
> framework.
> > > For
> > > > > example, look at the implementation of the TSimpleJSONProtocol
> (link
> > > > below)
> > > > > and use this as a starting point to write your own JSON-like
> > TProtocol
> > > > > implementation that suits your needs. That's what makes Thrift so
> > > > flexible
> > > > > -
> > > > > even if you have special needs, you need to replace only those
> parts
> > > and
> > > > it
> > > > > still simply works. If you prefer XML or some other format, even
> that
> > > > > should
> > > > > be feasible, but you have to invest some work either way.
> > > > >
> > > > > https://github.com/apache/thrift/blob/master/lib/java/
> > > > > src/org/apache/thrift/protocol/TSimpleJSONProtocol.java
> > > > >
> > > > > Does that help you?
> > > > >
> > > > > Have fun,
> > > > > JensG
> > > > >
> > > > >
> > > > > -----Ursprüngliche Nachricht-----
> > > > > From: Chet Murthy
> > > > > Sent: Thursday, September 28, 2017 3:04 AM
> > > > > To: user@thrift.apache.org
> > > > > Subject: Human-readable wire-format for Thrift?
> > > > >
> > > > > [I hope I'm sending this mail to the right list -- it wasn't clear
> to
> > > me
> > > > > that it should go to thrift-dev, so I figured I'd send it here
> > first.]
> > > > >
> > > > > The -one- thing that protobufs has going for it, over Thrift, is
> that
> > > > > protobufs has "CompactTextFormat" (and JSON too) as full
> > wire-formats.
> > > > > This is .... incredibly useful for the following use-case:
> > > > >
> > > > > You want to write a config-file format, and you want to get the
> > > benefits
> > > > of
> > > > > version-to-version compatibility.  In your program, you'd like to
> > > access
> > > > a
> > > > > strongly-typed "config object" with typed fields, and you'd -like-
> > for
> > > > > marshalling to/from flat-text to be automatically generated.
> > > > >
> > > > > I have personal experience with using protobufs in exactly this
> way,
> > > and
> > > > > it's really, really, really nice.
> > > > >
> > > > > The current Thrift JSON protocol isn't designed for this, and given
> > the
> > > > > interface of the (C++) TProtocol class, I think it isn't possible.
> > But
> > > > > with a small change, it -would- be possible, so I thought I'd
> > describe
> > > > the
> > > > > change, and see what you all thought (b/c it would require a change
> > to
> > > > > generated code, and to the TProtocol base class interfaces
> > > (specifically
> > > > to
> > > > > the readFieldBegin method):
> > > > >
> > > > > [I'll describe this for the C++ generated code; I haven't looked
> > > > carefully
> > > > > into the rest of the languages, but I'd guess that something could
> be
> > > > > done.]
> > > > >
> > > > > (0) Let me first note that these datastructures are constant, and
> > we're
> > > > > talking about passing an extra parameter to the read method listed
> > > above.
> > > > > That's it.
> > > > >
> > > > > (1) For concreteness, imagine a couple of message types
> > > > >
> > > > > struct Bar {
> > > > >   4: required i32 a ,
> > > > >   5: required string b,
> > > > > }
> > > > >
> > > > > struct Foo {
> > > > >   1: required i32 a ,
> > > > >   2: required string b,
> > > > >   3: required Bar c,
> > > > > }
> > > > >
> > > > > Again for concreteness, here's an example of the JSON protoocol
> for a
> > > > value
> > > > > of type Foo:
> > > > >
> > > > > {
> > > > >     "1": {
> > > > >         "i32": 1
> > > > >     },
> > > > >     "2": {
> > > > >         "str": "ugh"
> > > > >     },
> > > > >     "3": {
> > > > >         "rec": {
> > > > >             "4": {
> > > > >                 "i32": 2
> > > > >             },
> > > > >             "5": {
> > > > >                 "str": "argh"
> > > > >             }
> > > > >         }
> > > > >     }
> > > > > }
> > > > >
> > > > > (2) I'd prefer that that look like:
> > > > > {
> > > > >     "a": 1,
> > > > >     "b": "ugh",
> > > > >     "c": {
> > > > >          "a": 2,
> > > > >           "b": "argh"
> > > > >     }
> > > > > }
> > > > >
> > > > > (3) For each message-type, we need a mapping field-name ->
> > > > > pair<Thrift-type, field-id>.  So, generate a constant
> data-structure
> > of
> > > > > type
> > > > >
> > > > > map<string, pair<Type, int16_t> >
> > > > >
> > > > > for each message-type.
> > > > >
> > > > > (3) Marshalling is easy -- all the field-names are known, and we
> > could
> > > > just
> > > > > emit those instead of field-ids; similarly, we could skip putting
> > > > > type-information in the wire-format too.
> > > > >
> > > > > (4) At demarshalling time, we always know the type of the message
> > we're
> > > > > demarshalling.  So as we read field-names, we can use the map in #3
> > to
> > > > look
> > > > > up TType and field-id, and then just demarshal in the normal way.
> We
> > > > just
> > > > > need to pass that map as a constref to readFieldBegin.
> > > > >
> > > > > I -think- that that works, and can't find any problems with what
> I've
> > > > > described.
> > > > >
> > > > > I can make this change to the C++ library and code-generator, but
> > > before
> > > > I
> > > > > start down that path, I figured I should get some input on whether
> > this
> > > > is
> > > > > something that the Thrift community (and maintainers) would accept?
> > > > >
> > > > > I think that a human-readable/writable wire would be immensely
> > > valuable,
> > > > > and not just for the example of config-files.
> > > > >
> > > > > Your feedback appreciated,
> > > > > --chet--
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Human-readable wire-format for Thrift?

Posted by Jens Geyer <je...@hotmail.com>.
Hi Chet,


TL;DR:

Move on, send pull requests, iterate. We will probably crush it into pieces 
during review a few times :-), but it will be worth it.


LONG VERSION (read to the end, don't stop in between)


> I think you're incorrect.

Maybe. We all are. I was just asking, haven't read the whole conversation.


> (1) configuration data is -just- another form of data.

And executable programs are just another. Now what does that tell us? Should 
we write binaries with Thrift instead of PE or ELF? Could it be that that 
will be slightly less efficient? (but hey, it's human readable!).


> The reason I originally preferred Thrift to GRPC -- the reason I stick 
> with
> Thrift, is that Thrift is -modular-.  It allows me to install my own
> transport, my own protocol. But actually, I can only install my own
> protocol *within* *limits*.  I MUST stick with field-ids as identifiers,
> and -cannot- use field-names.   And the fix for this is ... so trivial.
> And would affect the peformance of other protocols ... not at all.

Then do it.


> (2) I'm not the first person to note that the Thrift JSON protocol is 
> horrendous,

Sure. I agree with that. I would not call it "horrendous" though, but yes, 
there's room for improvement. For example, I never understood why the field 
types have to be three-letter codes. Why do't we write the numeric types as 
we do with other formats?


> and what's really needed is some sort of Thrift JSON protocol
> that looks like idiomatic JSON.

Here I tend to disagree. As I see it, we already have two JSON fomats, but 
...
- the efficient one could still be better (i.e. even less bloated)
- the readable one could be more ... idiomatic, if we stick to that term. 
More on that below.

But these would still need to be two formats, plus at least one (the current 
TSimple) for comaptibility reasons. I see no way how we can have best 
efficiency and human readability in one. These are contradicting goals, so 
we still need two kinds of JSON.


> that the JSON protocol that emerges, would be idiomatic JSON.
> This is not without value.

Added value is asking "what for?". For performance? For human readability? 
Save the whales? World peace? And what do I have to sacrifice while doing 
that? Without the "... for XYZ" part any sentence of "added value" becomes 
just meaningless. And the use of "idiomatic" does not make it any better to 
me.


>  GRPC/Proto3 is a "we know how you should use RPC systems" answer.  It
> -prescribes- answers to every question you might ask, and if your answer 
> is
> different, well sucks to be you.  THIS IS BAD.

Yeah, that's Googles approach. Take it or leave it. If you do not like it, 
you just have the wrong problems.


> Look: again, I'm a rabid partisan of Thrift, and think that GRPC is a 
> massive mistake.
> This doesn't mean that I'm blind to the good ideas that Google came up 
> with.

You're welcome, you came to the right project. Honestly.


>  Thrift is -only- and -ever- and -forever- and RPC system.  It is never
> going to be for de/marshalling data-at-rest.  It is never going to
> -attempt- to provide (at a performance cost, surely) a universal type
> system for data.  No, that's not what Thrift is about.
>
> Do you really want to sign up for that?

Do we need PDF generation capabilities in grep?

Thrift is a small, but open framework. It does one (actually two) things and 
it does it well. I like that approach, maybe because I'm a big fan of the 
KISS principle. That does not mean that I'm not open for improvements, and 
we surely have plenty of room for that. For example, with our release 
process, to name just one.


Have fun,
JensG


-----Ursprüngliche Nachricht----- 
From: Chet Murthy
Sent: Tuesday, October 3, 2017 2:47 AM
To: user@thrift.apache.org
Subject: Re: Human-readable wire-format for Thrift?

Jens,

I think you're incorrect.  Please let me try to convince you.

(1) configuration data is -just- another form of data.  In many, many
systems, there is special code that reads config-files in whatever format
was chosen (YAML?  INI?) and transforms it into custom application-specific
data-structures.  This code has to be maintained and tested.  It is a form
of -demarshalling- code.

And then when you get into multi-language systems, you have to write that
code more-than-once.  Maintain it as your config changes.  Oh, groan.

You may say that Thrift is the -wrong- tool for this job.  But

(2) I'm not the first person to note that the Thrift JSON protocol is
horrendous, and what's really needed is some sort of Thrift JSON protocol
that looks like idiomatic JSON.  -An- advantage of what I propose, is that
the JSON protocol that emerges, would be idiomatic JSON.  This is not
without value.

(3) The change I'm suggesting is both minor, and adds function that your
-most- -popular- competitor already supports.

  Look: please, please understand me.  I HAVE GRAVE RESERVATIONS ABOUT
PROTO3.  Especially in Golang.  Holey moley.

  GRPC/Proto3 is a "we know how you should use RPC systems" answer.  It
-prescribes- answers to every question you might ask, and if your answer is
different, well sucks to be you.  THIS IS BAD.

The reason I originally preferred Thrift to GRPC -- the reason I stick with
Thrift, is that Thrift is -modular-.  It allows me to install my own
transport, my own protocol. But actually, I can only install my own
protocol *within* *limits*.  I MUST stick with field-ids as identifiers,
and -cannot- use field-names.   And the fix for this is ... so trivial.
And would affect the peformance of other protocols ... not at all.

Look: I can see why one might not want to make this change: it breaks
source-code compatibility (though that can be mitigated).

There is only one other argument I can see.  And that is this:

  Thrift is -only- and -ever- and -forever- and RPC system.  It is never
going to be for de/marshalling data-at-rest.  It is never going to
-attempt- to provide (at a performance cost, surely) a universal type
system for data.  No, that's not what Thrift is about.

Do you really want to sign up for that?
--chet--

P.S. Look: again, I'm a rabid partisan of Thrift, and think that GRPC is a
massive mistake.  This doesn't mean that I'm blind to the good ideas that
Google came up with.


On Mon, Oct 2, 2017 at 2:40 PM, Jens Geyer <je...@hotmail.com> wrote:

>
> Have you ever thought of using the wrong tool for the job?
>
>
> -----Ursprüngliche Nachricht-----
> From: Chet Murthy
> Sent: Monday, October 2, 2017 9:54 PM
> To: user@thrift.apache.org
> Subject: Re: Human-readable wire-format for Thrift?
>
> Randy,
>
> There is a different way that one could allow for Thrift IDL metadata to 
> be
> accessible at protocol-time (which is really what I'm proposing): one 
> could
> pass the entire metadata information the protocol constructor.  Pros &
> Cons:
>
> (1) for the config-file use-case I described, this is enough.
>
> (2) But this is NOT enough for the "human-readable wireline for Thrift
> servers".  I've noted that several times, people have asked for a JSON
> wire-format that they could use to invoke Thrift servers, that is ....
> well, more "normal JSON".  And this is not advanced by passing the IDL
> metadata to the T<NiceJSON>Protocol constructor, b/c that constructor is
> invoked from the TProtocolFactory, and there's structure in place for
> figuring out the right metadata to pass.
>
> BUT (3) one COULD imagine a new Thrift RPC stack, which would be
> initialized with IDL metadata, and which would pass the right metadata 
> down
> to the Protocol instances at creation time.
>
> This seems like much more work than the small change I'm proposiing.
>
> I -DO- see that the biggest con of what I'm proposing, is that it MIGHT
> break source-code compatibility for all previously-generated Thrift code.
> I think this is remediable in the following way:
>
>   (a) add a new method readFieldBegin2(....) where the metadata is passed
> as a (possibly NULL) pointer
>
>   (b) old generated code continues to call readFieldBegin()
>
>   (c) old protocol impls merely define readFieldBegin2() to call
> readFieldBegin()
>
>   (d) new generated code calls readFieldBegin2 (with that metadata)
>
>   (e) and new protocols like TNiceJSONProtocol and TCompactTextProtocol
> would implement readFieldBegin2 to do what I described in my first note,
> while implementing readFieldBegin() (the old method) to panic with an
> error-message.
>
> Uh ... this seems to solve the source-code compatibility issue.  I don't
> have a good understanding of what level of binary-code compatibility 
> Thrift
> promises, so I can't comment on that.
>
> Thoughts?
> --chet--
>
> 


Re: Human-readable wire-format for Thrift?

Posted by Chet Murthy <mu...@gmail.com>.
Jens,

I think you're incorrect.  Please let me try to convince you.

(1) configuration data is -just- another form of data.  In many, many
systems, there is special code that reads config-files in whatever format
was chosen (YAML?  INI?) and transforms it into custom application-specific
data-structures.  This code has to be maintained and tested.  It is a form
of -demarshalling- code.

And then when you get into multi-language systems, you have to write that
code more-than-once.  Maintain it as your config changes.  Oh, groan.

You may say that Thrift is the -wrong- tool for this job.  But

(2) I'm not the first person to note that the Thrift JSON protocol is
horrendous, and what's really needed is some sort of Thrift JSON protocol
that looks like idiomatic JSON.  -An- advantage of what I propose, is that
the JSON protocol that emerges, would be idiomatic JSON.  This is not
without value.

(3) The change I'm suggesting is both minor, and adds function that your
-most- -popular- competitor already supports.

  Look: please, please understand me.  I HAVE GRAVE RESERVATIONS ABOUT
PROTO3.  Especially in Golang.  Holey moley.

  GRPC/Proto3 is a "we know how you should use RPC systems" answer.  It
-prescribes- answers to every question you might ask, and if your answer is
different, well sucks to be you.  THIS IS BAD.

The reason I originally preferred Thrift to GRPC -- the reason I stick with
Thrift, is that Thrift is -modular-.  It allows me to install my own
transport, my own protocol. But actually, I can only install my own
protocol *within* *limits*.  I MUST stick with field-ids as identifiers,
and -cannot- use field-names.   And the fix for this is ... so trivial.
And would affect the peformance of other protocols ... not at all.

Look: I can see why one might not want to make this change: it breaks
source-code compatibility (though that can be mitigated).

There is only one other argument I can see.  And that is this:

  Thrift is -only- and -ever- and -forever- and RPC system.  It is never
going to be for de/marshalling data-at-rest.  It is never going to
-attempt- to provide (at a performance cost, surely) a universal type
system for data.  No, that's not what Thrift is about.

Do you really want to sign up for that?
--chet--

P.S. Look: again, I'm a rabid partisan of Thrift, and think that GRPC is a
massive mistake.  This doesn't mean that I'm blind to the good ideas that
Google came up with.


On Mon, Oct 2, 2017 at 2:40 PM, Jens Geyer <je...@hotmail.com> wrote:

>
> Have you ever thought of using the wrong tool for the job?
>
>
> -----Ursprüngliche Nachricht-----
> From: Chet Murthy
> Sent: Monday, October 2, 2017 9:54 PM
> To: user@thrift.apache.org
> Subject: Re: Human-readable wire-format for Thrift?
>
> Randy,
>
> There is a different way that one could allow for Thrift IDL metadata to be
> accessible at protocol-time (which is really what I'm proposing): one could
> pass the entire metadata information the protocol constructor.  Pros &
> Cons:
>
> (1) for the config-file use-case I described, this is enough.
>
> (2) But this is NOT enough for the "human-readable wireline for Thrift
> servers".  I've noted that several times, people have asked for a JSON
> wire-format that they could use to invoke Thrift servers, that is ....
> well, more "normal JSON".  And this is not advanced by passing the IDL
> metadata to the T<NiceJSON>Protocol constructor, b/c that constructor is
> invoked from the TProtocolFactory, and there's structure in place for
> figuring out the right metadata to pass.
>
> BUT (3) one COULD imagine a new Thrift RPC stack, which would be
> initialized with IDL metadata, and which would pass the right metadata down
> to the Protocol instances at creation time.
>
> This seems like much more work than the small change I'm proposiing.
>
> I -DO- see that the biggest con of what I'm proposing, is that it MIGHT
> break source-code compatibility for all previously-generated Thrift code.
> I think this is remediable in the following way:
>
>   (a) add a new method readFieldBegin2(....) where the metadata is passed
> as a (possibly NULL) pointer
>
>   (b) old generated code continues to call readFieldBegin()
>
>   (c) old protocol impls merely define readFieldBegin2() to call
> readFieldBegin()
>
>   (d) new generated code calls readFieldBegin2 (with that metadata)
>
>   (e) and new protocols like TNiceJSONProtocol and TCompactTextProtocol
> would implement readFieldBegin2 to do what I described in my first note,
> while implementing readFieldBegin() (the old method) to panic with an
> error-message.
>
> Uh ... this seems to solve the source-code compatibility issue.  I don't
> have a good understanding of what level of binary-code compatibility Thrift
> promises, so I can't comment on that.
>
> Thoughts?
> --chet--
>
>

Re: Human-readable wire-format for Thrift?

Posted by Jens Geyer <je...@hotmail.com>.
Have you ever thought of using the wrong tool for the job?


-----Ursprüngliche Nachricht----- 
From: Chet Murthy
Sent: Monday, October 2, 2017 9:54 PM
To: user@thrift.apache.org
Subject: Re: Human-readable wire-format for Thrift?

Randy,

There is a different way that one could allow for Thrift IDL metadata to be
accessible at protocol-time (which is really what I'm proposing): one could
pass the entire metadata information the protocol constructor.  Pros & Cons:

(1) for the config-file use-case I described, this is enough.

(2) But this is NOT enough for the "human-readable wireline for Thrift
servers".  I've noted that several times, people have asked for a JSON
wire-format that they could use to invoke Thrift servers, that is ....
well, more "normal JSON".  And this is not advanced by passing the IDL
metadata to the T<NiceJSON>Protocol constructor, b/c that constructor is
invoked from the TProtocolFactory, and there's structure in place for
figuring out the right metadata to pass.

BUT (3) one COULD imagine a new Thrift RPC stack, which would be
initialized with IDL metadata, and which would pass the right metadata down
to the Protocol instances at creation time.

This seems like much more work than the small change I'm proposiing.

I -DO- see that the biggest con of what I'm proposing, is that it MIGHT
break source-code compatibility for all previously-generated Thrift code.
I think this is remediable in the following way:

  (a) add a new method readFieldBegin2(....) where the metadata is passed
as a (possibly NULL) pointer

  (b) old generated code continues to call readFieldBegin()

  (c) old protocol impls merely define readFieldBegin2() to call
readFieldBegin()

  (d) new generated code calls readFieldBegin2 (with that metadata)

  (e) and new protocols like TNiceJSONProtocol and TCompactTextProtocol
would implement readFieldBegin2 to do what I described in my first note,
while implementing readFieldBegin() (the old method) to panic with an
error-message.

Uh ... this seems to solve the source-code compatibility issue.  I don't
have a good understanding of what level of binary-code compatibility Thrift
promises, so I can't comment on that.

Thoughts?
--chet-- 


Re: Human-readable wire-format for Thrift?

Posted by Chet Murthy <mu...@gmail.com>.
Randy,

There is a different way that one could allow for Thrift IDL metadata to be
accessible at protocol-time (which is really what I'm proposing): one could
pass the entire metadata information the protocol constructor.  Pros & Cons:

(1) for the config-file use-case I described, this is enough.

(2) But this is NOT enough for the "human-readable wireline for Thrift
servers".  I've noted that several times, people have asked for a JSON
wire-format that they could use to invoke Thrift servers, that is ....
well, more "normal JSON".  And this is not advanced by passing the IDL
metadata to the T<NiceJSON>Protocol constructor, b/c that constructor is
invoked from the TProtocolFactory, and there's structure in place for
figuring out the right metadata to pass.

BUT (3) one COULD imagine a new Thrift RPC stack, which would be
initialized with IDL metadata, and which would pass the right metadata down
to the Protocol instances at creation time.

This seems like much more work than the small change I'm proposiing.

I -DO- see that the biggest con of what I'm proposing, is that it MIGHT
break source-code compatibility for all previously-generated Thrift code.
I think this is remediable in the following way:

  (a) add a new method readFieldBegin2(....) where the metadata is passed
as a (possibly NULL) pointer

  (b) old generated code continues to call readFieldBegin()

  (c) old protocol impls merely define readFieldBegin2() to call
readFieldBegin()

  (d) new generated code calls readFieldBegin2 (with that metadata)

  (e) and new protocols like TNiceJSONProtocol and TCompactTextProtocol
would implement readFieldBegin2 to do what I described in my first note,
while implementing readFieldBegin() (the old method) to panic with an
error-message.

Uh ... this seems to solve the source-code compatibility issue.  I don't
have a good understanding of what level of binary-code compatibility Thrift
promises, so I can't comment on that.

Thoughts?
--chet--

Re: Human-readable wire-format for Thrift?

Posted by Randy Abernethy <ra...@gmail.com>.
I see. Are you opposed to serializing the mapping? The proto could buffer
writes and collect the mappings, then on writeStructEnd() you could emit
the map (maybe as an __map__ attribute or something) followed by the data.
The read side could read the map in response to readStructBegin(). Not only
would this require no mods to Thrift but it would have the added advantage
of making your wire format self describing. Kind of like Avro.

Thoughts?

On Mon, Oct 2, 2017 at 10:00 AM, Chet Murthy <mu...@gmail.com> wrote:

> Randy,
>
> Thank you for your questions!  I'm hoping that I'm mistaken, and maybe via
> this conversation, you can help me figure out that indeed I am.
>
> (1) you're right, that the writeFieldBegin method is passed the field-name,
> so it can write it on the JSON wire.
>
> (2) the problem is, readFieldBegin can read that back, but it cannot
> *infer* the fieldid from that name, and the *fieldid* is what's used in
> generated code to drive the switch for demarshalling.  Concretely, in your
> example, even if "fname" were set to either "f1" or "f2", the switch logic
> is driven by fid being set to either 1 or 2.  And there's no way for that
> to happen in a TProtocol, and specifically TSimpleJSONProtocol doesn't do
> it.  But generally, there's no way for it to happen, b/c inferring fieldid
> from fieldname depends in which message is being demarshalled, and the
> *protocol* object doesn't have access to type-(IDL)-information at all.
>
> I haven't yet implemented the change I contemplate, only b/c I wanted to
> find out how open Thrift was, to such a change.  But I can do so, if it'll
> help to explain what I mean -- it isn't difficult.
>
> --chet--
>
> On Mon, Oct 2, 2017 at 9:41 AM, Randy Abernethy <ra...@apache.org> wrote:
>
> > Hi Chet,
> >
> > You say there is no mapping between the field names and type/ids, yet
> every
> > struct (including param structs) hands just such data to the proto on
> > write. Why are the field string names supplied to the
> > TProtocol::writeFieldBegin method by the generated struct code
> > insufficient? The write code passes the proto the field name, type and
> id;
> > and the read is offered the opportunity to return them. Sounds like
> > everything your new protocol would need is supplied. As per Jens you just
> > need to serialize the data provided the way you want it (swapping field
> > names for ids).
> >
> > What am I missing (I'm guessing something :-)?
> >
> > For example, thrift IDL (notice bold items):
> >
> > >>>>>>>>>>>>
> >
> > struct data {
> >     1: i16 *f1*
> >     2: i16 *f2*
> > }
> >
> > <<<<<<<<<<<<<
> >
> > generates c++ write:
> >
> > >>>>>>>>>>
> >
> > uint32_t data::write(::apache::thrift::protocol::TProtocol* oprot)
> const {
> >   uint32_t xfer = 0;
> >   ::apache::thrift::protocol::TOutputRecursionTracker tracker(*oprot);
> >   xfer += oprot->writeStructBegin("data");
> >
> >   xfer += oprot->writeFieldBegin("*f1*", ::apache::thrift::protocol::T_
> > I16,
> > 1);
> >   xfer += oprot->writeI16(this->f1);
> >   xfer += oprot->writeFieldEnd();
> >
> >   xfer += oprot->writeFieldBegin("*f2*", ::apache::thrift::protocol::T_
> > I16,
> > 2);
> >   xfer += oprot->writeI16(this->f2);
> >   xfer += oprot->writeFieldEnd();
> >
> >   xfer += oprot->writeFieldStop();
> >   xfer += oprot->writeStructEnd();
> >   return xfer;
> > }
> >
> > <<<<<<<<<<<<<
> >
> > and read:
> >
> > >>>>>>>>>>>>>
> >
> > uint32_t data::read(::apache::thrift::protocol::TProtocol* iprot) {
> >
> >   ::apache::thrift::protocol::TInputRecursionTracker tracker(*iprot);
> >   uint32_t xfer = 0;
> >   std::string fname;
> >   ::apache::thrift::protocol::TType ftype;
> >   int16_t fid;
> >
> >   xfer += iprot->readStructBegin(fname);
> >
> >   using ::apache::thrift::protocol::TProtocolException;
> >
> >   while (true)
> >   {
> >     xfer += iprot->readFieldBegin(*fname*, ftype, fid);
> >     if (ftype == ::apache::thrift::protocol::T_STOP) {
> >       break;
> >     }
> >     switch (fid)
> >     {
> >       case 1:
> >         if (ftype == ::apache::thrift::protocol::T_I16) {
> >           xfer += iprot->readI16(this->f1);
> >           this->__isset.f1 = true;
> >         } else {
> >           xfer += iprot->skip(ftype);
> >         }
> >         break;
> >       case 2:
> >         if (ftype == ::apache::thrift::protocol::T_I16) {
> >           xfer += iprot->readI16(this->f2);
> >           this->__isset.f2 = true;
> >         } else {
> >           xfer += iprot->skip(ftype);
> >         }
> >         break;
> >       default:
> >         xfer += iprot->skip(ftype);
> >         break;
> >     }
> >     xfer += iprot->readFieldEnd();
> >   }
> >
> >   xfer += iprot->readStructEnd();
> >
> >   return xfer;
> > }
> >
> > <<<<<<<<<<<<<
> >
> >
> > --Randy
> >
> >
> > On Sat, Sep 30, 2017 at 6:38 AM, Edward Capriolo <ed...@gmail.com>
> > wrote:
> >
> > > Also i wonder if what is meant by human readable is simple a clever way
> > to
> > > generate pcap modules so tools like wireshark/tcp dump can read the
> data.
> > >
> > >
> > >
> > > On Thu, Sep 28, 2017 at 3:49 PM, Jens Geyer <je...@hotmail.com>
> > wrote:
> > >
> > > > Hi Chet,
> > > >
> > > > well, Thrift is primarily about efficiency, not human readability. If
> > > > machines and programs talk to each other, nobody really needs human
> > > > readable
> > > > messages, because there are no humans involved, except maybe for
> > > debugging
> > > > (but that's not a real production use case).  If one asked you to
> pick
> > > just
> > > > one single feature about any Serialization and RPC library,
> potentially
> > > > sacrificing any other requirement if needed, you probably would
> answer
> > > that
> > > > it should be as fast and efficient as possible.
> > > >
> > > > I only wonder if the human readability has sth to do with the fact
> that
> > > > gRPC
> > > > is often found being slower than Thrift ...  ;-)
> > > >
> > > > You still want a human readable fomat? Ok, here's how to do it.
> Thrift
> > > > indeed offers the ability to achieve that, because it is a framework.
> > For
> > > > example, look at the implementation of the TSimpleJSONProtocol (link
> > > below)
> > > > and use this as a starting point to write your own JSON-like
> TProtocol
> > > > implementation that suits your needs. That's what makes Thrift so
> > > flexible
> > > > -
> > > > even if you have special needs, you need to replace only those parts
> > and
> > > it
> > > > still simply works. If you prefer XML or some other format, even that
> > > > should
> > > > be feasible, but you have to invest some work either way.
> > > >
> > > > https://github.com/apache/thrift/blob/master/lib/java/
> > > > src/org/apache/thrift/protocol/TSimpleJSONProtocol.java
> > > >
> > > > Does that help you?
> > > >
> > > > Have fun,
> > > > JensG
> > > >
> > > >
> > > > -----Ursprüngliche Nachricht-----
> > > > From: Chet Murthy
> > > > Sent: Thursday, September 28, 2017 3:04 AM
> > > > To: user@thrift.apache.org
> > > > Subject: Human-readable wire-format for Thrift?
> > > >
> > > > [I hope I'm sending this mail to the right list -- it wasn't clear to
> > me
> > > > that it should go to thrift-dev, so I figured I'd send it here
> first.]
> > > >
> > > > The -one- thing that protobufs has going for it, over Thrift, is that
> > > > protobufs has "CompactTextFormat" (and JSON too) as full
> wire-formats.
> > > > This is .... incredibly useful for the following use-case:
> > > >
> > > > You want to write a config-file format, and you want to get the
> > benefits
> > > of
> > > > version-to-version compatibility.  In your program, you'd like to
> > access
> > > a
> > > > strongly-typed "config object" with typed fields, and you'd -like-
> for
> > > > marshalling to/from flat-text to be automatically generated.
> > > >
> > > > I have personal experience with using protobufs in exactly this way,
> > and
> > > > it's really, really, really nice.
> > > >
> > > > The current Thrift JSON protocol isn't designed for this, and given
> the
> > > > interface of the (C++) TProtocol class, I think it isn't possible.
> But
> > > > with a small change, it -would- be possible, so I thought I'd
> describe
> > > the
> > > > change, and see what you all thought (b/c it would require a change
> to
> > > > generated code, and to the TProtocol base class interfaces
> > (specifically
> > > to
> > > > the readFieldBegin method):
> > > >
> > > > [I'll describe this for the C++ generated code; I haven't looked
> > > carefully
> > > > into the rest of the languages, but I'd guess that something could be
> > > > done.]
> > > >
> > > > (0) Let me first note that these datastructures are constant, and
> we're
> > > > talking about passing an extra parameter to the read method listed
> > above.
> > > > That's it.
> > > >
> > > > (1) For concreteness, imagine a couple of message types
> > > >
> > > > struct Bar {
> > > >   4: required i32 a ,
> > > >   5: required string b,
> > > > }
> > > >
> > > > struct Foo {
> > > >   1: required i32 a ,
> > > >   2: required string b,
> > > >   3: required Bar c,
> > > > }
> > > >
> > > > Again for concreteness, here's an example of the JSON protoocol for a
> > > value
> > > > of type Foo:
> > > >
> > > > {
> > > >     "1": {
> > > >         "i32": 1
> > > >     },
> > > >     "2": {
> > > >         "str": "ugh"
> > > >     },
> > > >     "3": {
> > > >         "rec": {
> > > >             "4": {
> > > >                 "i32": 2
> > > >             },
> > > >             "5": {
> > > >                 "str": "argh"
> > > >             }
> > > >         }
> > > >     }
> > > > }
> > > >
> > > > (2) I'd prefer that that look like:
> > > > {
> > > >     "a": 1,
> > > >     "b": "ugh",
> > > >     "c": {
> > > >          "a": 2,
> > > >           "b": "argh"
> > > >     }
> > > > }
> > > >
> > > > (3) For each message-type, we need a mapping field-name ->
> > > > pair<Thrift-type, field-id>.  So, generate a constant data-structure
> of
> > > > type
> > > >
> > > > map<string, pair<Type, int16_t> >
> > > >
> > > > for each message-type.
> > > >
> > > > (3) Marshalling is easy -- all the field-names are known, and we
> could
> > > just
> > > > emit those instead of field-ids; similarly, we could skip putting
> > > > type-information in the wire-format too.
> > > >
> > > > (4) At demarshalling time, we always know the type of the message
> we're
> > > > demarshalling.  So as we read field-names, we can use the map in #3
> to
> > > look
> > > > up TType and field-id, and then just demarshal in the normal way.  We
> > > just
> > > > need to pass that map as a constref to readFieldBegin.
> > > >
> > > > I -think- that that works, and can't find any problems with what I've
> > > > described.
> > > >
> > > > I can make this change to the C++ library and code-generator, but
> > before
> > > I
> > > > start down that path, I figured I should get some input on whether
> this
> > > is
> > > > something that the Thrift community (and maintainers) would accept?
> > > >
> > > > I think that a human-readable/writable wire would be immensely
> > valuable,
> > > > and not just for the example of config-files.
> > > >
> > > > Your feedback appreciated,
> > > > --chet--
> > > >
> > > >
> > >
> >
>

Re: Human-readable wire-format for Thrift?

Posted by Chet Murthy <mu...@gmail.com>.
Randy,

Thank you for your questions!  I'm hoping that I'm mistaken, and maybe via
this conversation, you can help me figure out that indeed I am.

(1) you're right, that the writeFieldBegin method is passed the field-name,
so it can write it on the JSON wire.

(2) the problem is, readFieldBegin can read that back, but it cannot
*infer* the fieldid from that name, and the *fieldid* is what's used in
generated code to drive the switch for demarshalling.  Concretely, in your
example, even if "fname" were set to either "f1" or "f2", the switch logic
is driven by fid being set to either 1 or 2.  And there's no way for that
to happen in a TProtocol, and specifically TSimpleJSONProtocol doesn't do
it.  But generally, there's no way for it to happen, b/c inferring fieldid
from fieldname depends in which message is being demarshalled, and the
*protocol* object doesn't have access to type-(IDL)-information at all.

I haven't yet implemented the change I contemplate, only b/c I wanted to
find out how open Thrift was, to such a change.  But I can do so, if it'll
help to explain what I mean -- it isn't difficult.

--chet--

On Mon, Oct 2, 2017 at 9:41 AM, Randy Abernethy <ra...@apache.org> wrote:

> Hi Chet,
>
> You say there is no mapping between the field names and type/ids, yet every
> struct (including param structs) hands just such data to the proto on
> write. Why are the field string names supplied to the
> TProtocol::writeFieldBegin method by the generated struct code
> insufficient? The write code passes the proto the field name, type and id;
> and the read is offered the opportunity to return them. Sounds like
> everything your new protocol would need is supplied. As per Jens you just
> need to serialize the data provided the way you want it (swapping field
> names for ids).
>
> What am I missing (I'm guessing something :-)?
>
> For example, thrift IDL (notice bold items):
>
> >>>>>>>>>>>>
>
> struct data {
>     1: i16 *f1*
>     2: i16 *f2*
> }
>
> <<<<<<<<<<<<<
>
> generates c++ write:
>
> >>>>>>>>>>
>
> uint32_t data::write(::apache::thrift::protocol::TProtocol* oprot) const {
>   uint32_t xfer = 0;
>   ::apache::thrift::protocol::TOutputRecursionTracker tracker(*oprot);
>   xfer += oprot->writeStructBegin("data");
>
>   xfer += oprot->writeFieldBegin("*f1*", ::apache::thrift::protocol::T_
> I16,
> 1);
>   xfer += oprot->writeI16(this->f1);
>   xfer += oprot->writeFieldEnd();
>
>   xfer += oprot->writeFieldBegin("*f2*", ::apache::thrift::protocol::T_
> I16,
> 2);
>   xfer += oprot->writeI16(this->f2);
>   xfer += oprot->writeFieldEnd();
>
>   xfer += oprot->writeFieldStop();
>   xfer += oprot->writeStructEnd();
>   return xfer;
> }
>
> <<<<<<<<<<<<<
>
> and read:
>
> >>>>>>>>>>>>>
>
> uint32_t data::read(::apache::thrift::protocol::TProtocol* iprot) {
>
>   ::apache::thrift::protocol::TInputRecursionTracker tracker(*iprot);
>   uint32_t xfer = 0;
>   std::string fname;
>   ::apache::thrift::protocol::TType ftype;
>   int16_t fid;
>
>   xfer += iprot->readStructBegin(fname);
>
>   using ::apache::thrift::protocol::TProtocolException;
>
>   while (true)
>   {
>     xfer += iprot->readFieldBegin(*fname*, ftype, fid);
>     if (ftype == ::apache::thrift::protocol::T_STOP) {
>       break;
>     }
>     switch (fid)
>     {
>       case 1:
>         if (ftype == ::apache::thrift::protocol::T_I16) {
>           xfer += iprot->readI16(this->f1);
>           this->__isset.f1 = true;
>         } else {
>           xfer += iprot->skip(ftype);
>         }
>         break;
>       case 2:
>         if (ftype == ::apache::thrift::protocol::T_I16) {
>           xfer += iprot->readI16(this->f2);
>           this->__isset.f2 = true;
>         } else {
>           xfer += iprot->skip(ftype);
>         }
>         break;
>       default:
>         xfer += iprot->skip(ftype);
>         break;
>     }
>     xfer += iprot->readFieldEnd();
>   }
>
>   xfer += iprot->readStructEnd();
>
>   return xfer;
> }
>
> <<<<<<<<<<<<<
>
>
> --Randy
>
>
> On Sat, Sep 30, 2017 at 6:38 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
> > Also i wonder if what is meant by human readable is simple a clever way
> to
> > generate pcap modules so tools like wireshark/tcp dump can read the data.
> >
> >
> >
> > On Thu, Sep 28, 2017 at 3:49 PM, Jens Geyer <je...@hotmail.com>
> wrote:
> >
> > > Hi Chet,
> > >
> > > well, Thrift is primarily about efficiency, not human readability. If
> > > machines and programs talk to each other, nobody really needs human
> > > readable
> > > messages, because there are no humans involved, except maybe for
> > debugging
> > > (but that's not a real production use case).  If one asked you to pick
> > just
> > > one single feature about any Serialization and RPC library, potentially
> > > sacrificing any other requirement if needed, you probably would answer
> > that
> > > it should be as fast and efficient as possible.
> > >
> > > I only wonder if the human readability has sth to do with the fact that
> > > gRPC
> > > is often found being slower than Thrift ...  ;-)
> > >
> > > You still want a human readable fomat? Ok, here's how to do it. Thrift
> > > indeed offers the ability to achieve that, because it is a framework.
> For
> > > example, look at the implementation of the TSimpleJSONProtocol (link
> > below)
> > > and use this as a starting point to write your own JSON-like TProtocol
> > > implementation that suits your needs. That's what makes Thrift so
> > flexible
> > > -
> > > even if you have special needs, you need to replace only those parts
> and
> > it
> > > still simply works. If you prefer XML or some other format, even that
> > > should
> > > be feasible, but you have to invest some work either way.
> > >
> > > https://github.com/apache/thrift/blob/master/lib/java/
> > > src/org/apache/thrift/protocol/TSimpleJSONProtocol.java
> > >
> > > Does that help you?
> > >
> > > Have fun,
> > > JensG
> > >
> > >
> > > -----Ursprüngliche Nachricht-----
> > > From: Chet Murthy
> > > Sent: Thursday, September 28, 2017 3:04 AM
> > > To: user@thrift.apache.org
> > > Subject: Human-readable wire-format for Thrift?
> > >
> > > [I hope I'm sending this mail to the right list -- it wasn't clear to
> me
> > > that it should go to thrift-dev, so I figured I'd send it here first.]
> > >
> > > The -one- thing that protobufs has going for it, over Thrift, is that
> > > protobufs has "CompactTextFormat" (and JSON too) as full wire-formats.
> > > This is .... incredibly useful for the following use-case:
> > >
> > > You want to write a config-file format, and you want to get the
> benefits
> > of
> > > version-to-version compatibility.  In your program, you'd like to
> access
> > a
> > > strongly-typed "config object" with typed fields, and you'd -like- for
> > > marshalling to/from flat-text to be automatically generated.
> > >
> > > I have personal experience with using protobufs in exactly this way,
> and
> > > it's really, really, really nice.
> > >
> > > The current Thrift JSON protocol isn't designed for this, and given the
> > > interface of the (C++) TProtocol class, I think it isn't possible.  But
> > > with a small change, it -would- be possible, so I thought I'd describe
> > the
> > > change, and see what you all thought (b/c it would require a change to
> > > generated code, and to the TProtocol base class interfaces
> (specifically
> > to
> > > the readFieldBegin method):
> > >
> > > [I'll describe this for the C++ generated code; I haven't looked
> > carefully
> > > into the rest of the languages, but I'd guess that something could be
> > > done.]
> > >
> > > (0) Let me first note that these datastructures are constant, and we're
> > > talking about passing an extra parameter to the read method listed
> above.
> > > That's it.
> > >
> > > (1) For concreteness, imagine a couple of message types
> > >
> > > struct Bar {
> > >   4: required i32 a ,
> > >   5: required string b,
> > > }
> > >
> > > struct Foo {
> > >   1: required i32 a ,
> > >   2: required string b,
> > >   3: required Bar c,
> > > }
> > >
> > > Again for concreteness, here's an example of the JSON protoocol for a
> > value
> > > of type Foo:
> > >
> > > {
> > >     "1": {
> > >         "i32": 1
> > >     },
> > >     "2": {
> > >         "str": "ugh"
> > >     },
> > >     "3": {
> > >         "rec": {
> > >             "4": {
> > >                 "i32": 2
> > >             },
> > >             "5": {
> > >                 "str": "argh"
> > >             }
> > >         }
> > >     }
> > > }
> > >
> > > (2) I'd prefer that that look like:
> > > {
> > >     "a": 1,
> > >     "b": "ugh",
> > >     "c": {
> > >          "a": 2,
> > >           "b": "argh"
> > >     }
> > > }
> > >
> > > (3) For each message-type, we need a mapping field-name ->
> > > pair<Thrift-type, field-id>.  So, generate a constant data-structure of
> > > type
> > >
> > > map<string, pair<Type, int16_t> >
> > >
> > > for each message-type.
> > >
> > > (3) Marshalling is easy -- all the field-names are known, and we could
> > just
> > > emit those instead of field-ids; similarly, we could skip putting
> > > type-information in the wire-format too.
> > >
> > > (4) At demarshalling time, we always know the type of the message we're
> > > demarshalling.  So as we read field-names, we can use the map in #3 to
> > look
> > > up TType and field-id, and then just demarshal in the normal way.  We
> > just
> > > need to pass that map as a constref to readFieldBegin.
> > >
> > > I -think- that that works, and can't find any problems with what I've
> > > described.
> > >
> > > I can make this change to the C++ library and code-generator, but
> before
> > I
> > > start down that path, I figured I should get some input on whether this
> > is
> > > something that the Thrift community (and maintainers) would accept?
> > >
> > > I think that a human-readable/writable wire would be immensely
> valuable,
> > > and not just for the example of config-files.
> > >
> > > Your feedback appreciated,
> > > --chet--
> > >
> > >
> >
>