You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Jim McLaughlin <jo...@gmail.com> on 2015/08/30 21:46:13 UTC

MsgPack Protocol

Is it possible to create a MsgPack Protocol for Thrift? I'm starting a new
project and the lead wants to use MsgPack for serialization. I was thinking
to use Thrift for the IDL, but the only search results yielded for Thrift
and MsgPack are comparisons between the two. This leads me to believe there
is something that prevents their cooperation.

Cheers,
Jim

Re: MsgPack Protocol

Posted by Jim McLaughlin <jo...@gmail.com>.
Thanks Randy great answer! I was basically looking for confirmation that I
can create a msgpack protocol for Thrift without trying to jam a square peg
in a round hole. The reason for doing this is that not all of the system
will use Thrift IDL for interaction, and the lead has already gone off and
prototyped that part of the system with msgpack. I'll report back with my
results.

Best,
Jim

On Sun, Aug 30, 2015 at 5:06 PM, Randy Abernethy <ra...@gmail.com>
wrote:

> Hi Jim,
>
> As you note MsgPack is roughly equivalent to an Apache Thrift protocol
> (like TBinaryProtocol or TCompactProtocol or TJSONProtocol) in that it
> serializes and deserializes program data. That said, there are no current
> implementations of MsgPack in a form usable as an Apache Thrift Protocol as
> far as I know.
>
> MsgPack is an arbitrary JSON serializer and Apache Thrift Protocols are IDL
> driven serializers. If you want to use IDL to describe your interfaces and
> are looking for minimal size, Apache Thrift has TCompactProtocol which
> produces comparable serialized objects to MsgPack. For example, the MsgPack
> web site (http://msgpack.org/index.html) shows the JSON {"compact":true,
> "schema":0} stored in 18 bytes. Using Apache Thrift and the
> TCompactProtocol (per the IDL and python example below) the output could be
> anywhere from 21 bytes down to 4 depending on how you structure the IDL.
> The three extra bytes on the larger example are associated with the Thrift
> packaging, adding data to the map scales byte for byte with MsgPack (e.g.
> adding an additional 7 char string key with a small int value would add 9
> bytes in both systems). The smaller 4 byte example uses ordinals to
> identify the struct fields, eliding the strings altogether.
>
> Certainly not an apples to apples comparison but does highlight some of the
> distinctions between the two systems. A Msg Pack client can read an
> arbitrary block of data and recover a JSON document. Apache Thrift clients
> must know what type of structure they are going to receive in advance
> (hence the IDL). For this reason you will probably find MsgPack more
> flexible for ad hoc serialization but Apache Thrift to be faster (an order
> of magnitude at least) when serializing/deserializing the same input. Also,
> of course, Apache Thrift Protocols are integrated with the Apache Thrift
> framework and inter-operate seamlessly with the Apache Thrift RPC
> framework.
>
> In my view, for an arbitrary data storage system MsgPack might be better
> but for an RPC system, where the server needs to know what to do with the
> data sent, Apache Thrift IDL/Protocols might be better. Also, though I'm
> not sure of the utility, you could fairly easily convert one of the MsgPack
> implementations into an Apache Thrift protocol. This would allow you to use
> IDL driven MsgPack for Thrift RPC and arbitrary client MsgPack to read
> thrift serialized messages (e.g. sent over a message broker, teed from an
> RPC stream for logging, or whatever). You could also use Apache Thrift
> TJSONProtocol for RPC and then MsgPack the JSON in a transport layer (see
> JSON protocol output in example below). Lots of ways for the two to play
> together.
>
> Hope this shines some light.
>
> Best,
> Randy
>
>
>
> #mpex.thrift
> #########################################
> struct test {
>     1: map<string,i64> x
> }
>
> struct test2 {
>     1: bool compact
>     2: i64 schema
> }
>
>
> # test.py
> #########################################
> import sys
> sys.path.append("gen-py")
>
> from thrift.transport import TTransport
> from thrift.protocol import TCompactProtocol
> from thrift.protocol import TJSONProtocol
> from mpex import ttypes
>
> ## sample map output  [21 bytes]
> trans = TTransport.TFileObjectTransport(open("data","wb"))
> trans.open()
> proto = TCompactProtocol.TCompactProtocol(trans)
>
> i = ttypes.test({})
> i.x["compact"] = 1
> i.x["schema"] = 0
>
> i.write(proto)
> trans.close
>
> ## sample struct output  [4 bytes]
> trans = TTransport.TFileObjectTransport(open("data2","wb"))
> trans.open()
> proto = TCompactProtocol.TCompactProtocol(trans)
>
> i = ttypes.test2()
> i.compact = True
> i.schema = 0
>
> i.write(proto)
> trans.close
>
> ## sample JSON output  [54 bytes]
> trans = TTransport.TFileObjectTransport(open("data3.json","wb"))
> trans.open()
> proto = TJSONProtocol.TJSONProtocol(trans)
>
> i = ttypes.test({})
> i.x["compact"] = 1
> i.x["schema"] = 0
>
> i.write(proto)
> trans.close
>
>
> # example run
> #########################################
> thrift@ubuntu:~/tmp$ thrift --gen py mpex.thrift
> thrift@ubuntu:~/tmp$ python test.py
> thrift@ubuntu:~/tmp$ ls -l data*
> -rw-rw-r-- 1 thrift thrift 21 Aug 30 14:55 data
> -rw-rw-r-- 1 thrift thrift  4 Aug 30 14:55 data2
> -rw-rw-r-- 1 thrift thrift 54 Aug 30 14:55 data3.json
>
>
>
>
>
> On Sun, Aug 30, 2015 at 12:46 PM, Jim McLaughlin <jo...@gmail.com>
> wrote:
>
> > Is it possible to create a MsgPack Protocol for Thrift? I'm starting a
> new
> > project and the lead wants to use MsgPack for serialization. I was
> thinking
> > to use Thrift for the IDL, but the only search results yielded for Thrift
> > and MsgPack are comparisons between the two. This leads me to believe
> there
> > is something that prevents their cooperation.
> >
> > Cheers,
> > Jim
> >
>

Re: MsgPack Protocol

Posted by Randy Abernethy <ra...@gmail.com>.
Hi Jim,

As you note MsgPack is roughly equivalent to an Apache Thrift protocol
(like TBinaryProtocol or TCompactProtocol or TJSONProtocol) in that it
serializes and deserializes program data. That said, there are no current
implementations of MsgPack in a form usable as an Apache Thrift Protocol as
far as I know.

MsgPack is an arbitrary JSON serializer and Apache Thrift Protocols are IDL
driven serializers. If you want to use IDL to describe your interfaces and
are looking for minimal size, Apache Thrift has TCompactProtocol which
produces comparable serialized objects to MsgPack. For example, the MsgPack
web site (http://msgpack.org/index.html) shows the JSON {"compact":true,
"schema":0} stored in 18 bytes. Using Apache Thrift and the
TCompactProtocol (per the IDL and python example below) the output could be
anywhere from 21 bytes down to 4 depending on how you structure the IDL.
The three extra bytes on the larger example are associated with the Thrift
packaging, adding data to the map scales byte for byte with MsgPack (e.g.
adding an additional 7 char string key with a small int value would add 9
bytes in both systems). The smaller 4 byte example uses ordinals to
identify the struct fields, eliding the strings altogether.

Certainly not an apples to apples comparison but does highlight some of the
distinctions between the two systems. A Msg Pack client can read an
arbitrary block of data and recover a JSON document. Apache Thrift clients
must know what type of structure they are going to receive in advance
(hence the IDL). For this reason you will probably find MsgPack more
flexible for ad hoc serialization but Apache Thrift to be faster (an order
of magnitude at least) when serializing/deserializing the same input. Also,
of course, Apache Thrift Protocols are integrated with the Apache Thrift
framework and inter-operate seamlessly with the Apache Thrift RPC framework.

In my view, for an arbitrary data storage system MsgPack might be better
but for an RPC system, where the server needs to know what to do with the
data sent, Apache Thrift IDL/Protocols might be better. Also, though I'm
not sure of the utility, you could fairly easily convert one of the MsgPack
implementations into an Apache Thrift protocol. This would allow you to use
IDL driven MsgPack for Thrift RPC and arbitrary client MsgPack to read
thrift serialized messages (e.g. sent over a message broker, teed from an
RPC stream for logging, or whatever). You could also use Apache Thrift
TJSONProtocol for RPC and then MsgPack the JSON in a transport layer (see
JSON protocol output in example below). Lots of ways for the two to play
together.

Hope this shines some light.

Best,
Randy



#mpex.thrift
#########################################
struct test {
    1: map<string,i64> x
}

struct test2 {
    1: bool compact
    2: i64 schema
}


# test.py
#########################################
import sys
sys.path.append("gen-py")

from thrift.transport import TTransport
from thrift.protocol import TCompactProtocol
from thrift.protocol import TJSONProtocol
from mpex import ttypes

## sample map output  [21 bytes]
trans = TTransport.TFileObjectTransport(open("data","wb"))
trans.open()
proto = TCompactProtocol.TCompactProtocol(trans)

i = ttypes.test({})
i.x["compact"] = 1
i.x["schema"] = 0

i.write(proto)
trans.close

## sample struct output  [4 bytes]
trans = TTransport.TFileObjectTransport(open("data2","wb"))
trans.open()
proto = TCompactProtocol.TCompactProtocol(trans)

i = ttypes.test2()
i.compact = True
i.schema = 0

i.write(proto)
trans.close

## sample JSON output  [54 bytes]
trans = TTransport.TFileObjectTransport(open("data3.json","wb"))
trans.open()
proto = TJSONProtocol.TJSONProtocol(trans)

i = ttypes.test({})
i.x["compact"] = 1
i.x["schema"] = 0

i.write(proto)
trans.close


# example run
#########################################
thrift@ubuntu:~/tmp$ thrift --gen py mpex.thrift
thrift@ubuntu:~/tmp$ python test.py
thrift@ubuntu:~/tmp$ ls -l data*
-rw-rw-r-- 1 thrift thrift 21 Aug 30 14:55 data
-rw-rw-r-- 1 thrift thrift  4 Aug 30 14:55 data2
-rw-rw-r-- 1 thrift thrift 54 Aug 30 14:55 data3.json





On Sun, Aug 30, 2015 at 12:46 PM, Jim McLaughlin <jo...@gmail.com>
wrote:

> Is it possible to create a MsgPack Protocol for Thrift? I'm starting a new
> project and the lead wants to use MsgPack for serialization. I was thinking
> to use Thrift for the IDL, but the only search results yielded for Thrift
> and MsgPack are comparisons between the two. This leads me to believe there
> is something that prevents their cooperation.
>
> Cheers,
> Jim
>