You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mina.apache.org by Emmanuel Lecharny <el...@gmail.com> on 2009/01/07 22:20:47 UTC

Google Protocol Buffer codec...

Hi guys,

a couple of weeks ago, Thomasz Blachowicz submitted a patch including 
some Google protocol buffer codec (see 
https://issues.apache.org/jira/browse/DIRMINA-654).

I just injected it into the sandbox, for us to review the code : 
http://svn.apache.org/viewvc?rev=732497&view=rev

If some of us can review the code and tell if it's ok, we then would be 
able to inject it in the project (maybe after 2.0, depends).

Or if you have any suggestion, or think that it's not worth being a part 
of MINA, feel free to give some feedback.

Many thanks !

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Google Protocol Buffer codec...

Posted by Tomasz Blachowicz <tb...@gmail.com>.
On Thu, Jan 8, 2009 at 9:44 AM, Emmanuel Lecharny <el...@gmail.com>wrote:
>
> > AFAIK, protocol buffer codec is a binary data binding framework,
> > something similar to XML-Data binding framework's like JAXB, Castor,
> > XMLBeans. Google's implementation reminds of CORBA IDL's
> >
> > There is basic operational similarity of Google Protocol Buffer codec
> > with prefixed string decoder. We have a length and then actual
> > messages. It will be worthwhile to explore if we can align the google
> > codec with prefixed string codec.
>

One of the good candidates for such a codec is Thrift initially developed by
Facebook and recently contributed to Apache (
http://incubator.apache.org/thrift) Thirift is very similar to Protocol
Buffers.

Yep. The idea is to get a codec or those who want to use GPB (Google
> Protocol Buffer) with MINA. I guess that at some point, we will have a
> library of such codecs...
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

Re: Google Protocol Buffer codec...

Posted by Ashish <pa...@gmail.com>.
On Thu, Jan 8, 2009 at 3:14 PM, Emmanuel Lecharny <el...@gmail.com> wrote:
> On Thu, Jan 8, 2009 at 7:16 AM, Ashish <pa...@gmail.com> wrote:
>> Some observation
>> 1. the sandbox folder has whole of MINA code. Will be great if we can
>> remove other files
> You mean, cleaning the sandbox from all the old branches ?

Check http://svn.apache.org/viewvc/mina/sandbox/protocol-buffers/

It has got all the code that we have for MINA. Shouldn't we have just
filter-codec-protobuf? or I am missing something

> Yep. The idea is to get a codec or those who want to use GPB (Google
> Protocol Buffer) with MINA. I guess that at some point, we will have a
> library of such codecs...

Agree, this shall make MINA more user friendly. Something like pick
codec, add to chains and base solution is ready.
Add toppings of your choice to taste better

Re: Google Protocol Buffer codec...

Posted by Emmanuel Lecharny <el...@gmail.com>.
On Thu, Jan 8, 2009 at 7:16 AM, Ashish <pa...@gmail.com> wrote:
> Some observation
> 1. the sandbox folder has whole of MINA code. Will be great if we can
> remove other files

You mean, cleaning the sandbox from all the old branches ?


> AFAIK, protocol buffer codec is a binary data binding framework,
> something similar to XML-Data binding framework's like JAXB, Castor,
> XMLBeans. Google's implementation reminds of CORBA IDL's
>
> There is basic operational similarity of Google Protocol Buffer codec
> with prefixed string decoder. We have a length and then actual
> messages. It will be worthwhile to explore if we can align the google
> codec with prefixed string codec.

Yep. The idea is to get a codec or those who want to use GPB (Google
Protocol Buffer) with MINA. I guess that at some point, we will have a
library of such codecs...


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Google Protocol Buffer codec...

Posted by James Mansion <ja...@mansionfamily.plus.com>.
Ashish wrote:
> For most of the well established protocols like FTP, HTTP, SNMP, LDAP
> etc, they don't do a value add.
>   
Indeed - but its also uninteresting to implement them - to me, anyway.  
There are lots of
implementations available already, and they overcomplicate (and 
generally destroy the
performance of) simple async messaging between components of an 
application.  The
key is really that the message interaction pattern for these protocols 
isn't always a
good match for the application requirement, at least internally. Each to 
his own, I guess.


Re: Google Protocol Buffer codec...

Posted by Ashish <pa...@gmail.com>.
On Sun, Jan 11, 2009 at 3:25 PM, James Mansion
<ja...@mansionfamily.plus.com> wrote:
> Ashish wrote:
>>
>> with prefixed string decoder. We have a length and then actual
>> messages. It will be worthwhile to explore if we can align the google
>> codec with prefixed string codec.
>>
>
> Almost all message-oriented protocols seem to have:
> - a fixed length header containing a length, possibly implied
> - the payload

Agree

> I suspect that you can abstract most with a general binary framework that:
> - defined the fixed length part
> - registers a callback that can process the fixed length header and return
> the payload length
> - registers a callback that receives both the fixed length header and the
> payload
>
> The key is that the length need not be at the start of the header, and may
> be computed.
>
> The dificulty is deciding how to handle long payloads.  It may be that for
> long data
> you should stream to a file and then pass header plus stream - but doing so
> implies a
> piecewise payload handler and that should probably be exposed too for cases
> where
> the application wishes to process as it goes.
>
My comment was more from perspective that, such frameworks are good if
we are starting from scratch :-)
For most of the well established protocols like FTP, HTTP, SNMP, LDAP
etc, they don't do a value add.
Essentially, if we have some framework to reduce development work with
these AL protocols, would be great.
Lets see, if someone does something on these lines in PhD thesis :-)

Re: Google Protocol Buffer codec...

Posted by James Mansion <ja...@mansionfamily.plus.com>.
Ashish wrote:
> with prefixed string decoder. We have a length and then actual
> messages. It will be worthwhile to explore if we can align the google
> codec with prefixed string codec.
>   
Almost all message-oriented protocols seem to have:
 - a fixed length header containing a length, possibly implied
 - the payload

I suspect that you can abstract most with a general binary framework that:
 - defined the fixed length part
 - registers a callback that can process the fixed length header and 
return the payload length
 - registers a callback that receives both the fixed length header and 
the payload

The key is that the length need not be at the start of the header, and 
may be computed.

The dificulty is deciding how to handle long payloads.  It may be that 
for long data
you should stream to a file and then pass header plus stream - but doing 
so implies a
piecewise payload handler and that should probably be exposed too for 
cases where
the application wishes to process as it goes.

James


Re: Google Protocol Buffer codec...

Posted by Ashish <pa...@gmail.com>.
On Thu, Jan 8, 2009 at 10:55 PM, Tomasz Blachowicz
<tb...@gmail.com> wrote:
> On Thu, Jan 8, 2009 at 6:16 AM, Ashish <pa...@gmail.com> wrote:
>
>> Some observation
>> 1. the sandbox folder has whole of MINA code. Will be great if we can
>> remove other files
>
>
> The filter codec I've develop to handle Google Protocol Buffers essentially
> lives in his own module mina-filter-codec-protobuf and depends only on
> mina-core module. So, if you want to focus only on the code I've
> contributed, please consider mina-filter-codec-protobuf  module.

[ashish] This is fine :-)

>
>> AFAIK, protocol buffer codec is a binary data binding framework,
>> something similar to XML-Data binding framework's like JAXB, Castor,
>> XMLBeans. Google's implementation reminds of CORBA IDL's
>>
>> There is basic operational similarity of Google Protocol Buffer codec
>> with prefixed string decoder. We have a length and then actual
>> messages. It will be worthwhile to explore if we can align the google
>> codec with prefixed string codec.
>>

> Yes, the prefixed string codec is similar to the one I've developed for
> protobuf messages. However there are two things to be considered before we
> would start any alignment:
> 1. Prefixed string codec has been implemented in mina-core module and
> protobuf codec resides in it's own *optional* module.

[ashish] Well alignment is desirable, its good to have similar codes
behave in same way and coded in same pattern.
Its easy for Users to understand.

> 2. Prefixed string codec uses fixed format to encode length of the message
> as protobuf codec uses varint way to encode the integer values (
> http://code.google.com/apis/protocolbuffers/docs/encoding.html)<http://code.google.com/apis/protocolbuffers/docs/encoding.html>.
> This way of encoding is pretty nifty and can save bandwidth if you are
> sending many lightweight messages. For instance value less than 128 requires
> only two bytes to encode and values less than 16K requires 3. What is more,
> prefixed string codec uses method on IoBuffer implementation to determine
> the size, while protobuf codec determines the size internally using varints
> encoding.

[ashish] The idea is u get length before message, how u determine
length can vary. BER encoding has TLV encoding,
something similar. Well not actually questioning how Google's protocol behaves.

> There is one more situation that must be though about is piggybacked
>> messages. the implementation can receive a packet with one message
>> ending and another message starting (again by nature of TCP). So
>> returning true from decoder may result in an exception (more data
>> remaining) [Please correct me if I am wrong]
>
>
> This is not a problem as when the decoder receives a buffer of say 64 bytes
> and reads only 50 of them to decode the message (and returns true and then
> the framework will present remaining 14 bytes in the next call to the decode
> method. Please refer to the test cases I've developed.

[ashish] Well it was more towards how CumulativeProtocolDecoder works.
I thought if you have more data in buffer and you return true from
doDecode(), it shall throw an Exception. Seems my understanding is
incorrect, the documentation states "if there is any data left that
cannot be decoded, we store it in a buffer in the session and next
time this decoder is invoked the session buffer gets appended to".
Ignore my comment. Will have to rectify my implementation :-)

>> Also, is it worthwhile to have a mechanism to add extensions into
>> ExtensionRegistry? or the library is doing it.
>
>
> I spent some time playing around with Google Protocol Buffers and in my
> opinion the extensions is one of the strongest features of the library.
> Because of the fact that the protobuf is not self-descriptive protocol
> (as opposite to Java serialization) the meta data about the message carried
> over the wire is very limited and have to be provided to the decoder by
> the application that consumes the messages. However I'm open to all the
> advices and suggestions how we might do it better :)

[ashish] Well Here it more usability perspective. Again don't
understand the mechanism completely. It more or less like you know
what objects you are going to decode for your application. So you
preload them some how. In MPXJ, lib adds appropriate decoder in a map
and returns the requested one. So we could replicate the same, if this
is how protocol buffer work :-)

>> Would like to spend some more time with the code before commenting
>> more on this. Though it will be worthwhile to have this codec with us.
>> Once we review the code, lets vote for the same.
>
>
> Any comments are welcome and highly appreciated!

[ashish] This is more for me, will learn from your code :-)

>> Shall try to run this implementation and validate it against my
>> possible use case. Though would have loved to have an implementation
>> where I don't have to define .proto files. (I know the implementation
>> doesn't work that ways)
>
>
> Well, in my opinion it'd be madness to write the Java class for the message
> by hand. What I can do is I can provide you with the maven code snippet that
> will allow you automatically generate Java code from .proto files while you
> build the application. How does it sound?

[ashish] Hmm not convinced on this. My targets are to create DHCP,
SMPP, DNS and TFTP servers. Now Google's implementation doesn't help
me here in any ways. This was the primary reason why I didn't went
further into buffers. Anyways that's off topic.

For MINA, having this codec is a big plus :-)

Re: Google Protocol Buffer codec...

Posted by Tomasz Blachowicz <tb...@gmail.com>.
On Thu, Jan 8, 2009 at 6:16 AM, Ashish <pa...@gmail.com> wrote:

> Some observation
> 1. the sandbox folder has whole of MINA code. Will be great if we can
> remove other files


The filter codec I've develop to handle Google Protocol Buffers essentially
lives in his own module mina-filter-codec-protobuf and depends only on
mina-core module. So, if you want to focus only on the code I've
contributed, please consider mina-filter-codec-protobuf  module.


> AFAIK, protocol buffer codec is a binary data binding framework,
> something similar to XML-Data binding framework's like JAXB, Castor,
> XMLBeans. Google's implementation reminds of CORBA IDL's
>
> There is basic operational similarity of Google Protocol Buffer codec
> with prefixed string decoder. We have a length and then actual
> messages. It will be worthwhile to explore if we can align the google
> codec with prefixed string codec.
>

Yes, the prefixed string codec is similar to the one I've developed for
protobuf messages. However there are two things to be considered before we
would start any alignment:
1. Prefixed string codec has been implemented in mina-core module and
protobuf codec resides in it's own *optional* module.
2. Prefixed string codec uses fixed format to encode length of the message
as protobuf codec uses varint way to encode the integer values (
http://code.google.com/apis/protocolbuffers/docs/encoding.html)<http://code.google.com/apis/protocolbuffers/docs/encoding.html>.
This way of encoding is pretty nifty and can save bandwidth if you are
sending many lightweight messages. For instance value less than 128 requires
only two bytes to encode and values less than 16K requires 3. What is more,
prefixed string codec uses method on IoBuffer implementation to determine
the size, while protobuf codec determines the size internally using varints
encoding.

There is one more situation that must be though about is piggybacked
> messages. the implementation can receive a packet with one message
> ending and another message starting (again by nature of TCP). So
> returning true from decoder may result in an exception (more data
> remaining) [Please correct me if I am wrong]


This is not a problem as when the decoder receives a buffer of say 64 bytes
and reads only 50 of them to decode the message (and returns true and then
the framework will present remaining 14 bytes in the next call to the decode
method. Please refer to the test cases I've developed.


> Also, is it worthwhile to have a mechanism to add extensions into
> ExtensionRegistry? or the library is doing it.


I spent some time playing around with Google Protocol Buffers and in my
opinion the extensions is one of the strongest features of the library.
Because of the fact that the protobuf is not self-descriptive protocol
(as opposite to Java serialization) the meta data about the message carried
over the wire is very limited and have to be provided to the decoder by
the application that consumes the messages. However I'm open to all the
advices and suggestions how we might do it better :)


> Would like to spend some more time with the code before commenting
> more on this. Though it will be worthwhile to have this codec with us.
> Once we review the code, lets vote for the same.


Any comments are welcome and highly appreciated!


> Shall try to run this implementation and validate it against my
> possible use case. Though would have loved to have an implementation
> where I don't have to define .proto files. (I know the implementation
> doesn't work that ways)


Well, in my opinion it'd be madness to write the Java class for the message
by hand. What I can do is I can provide you with the maven code snippet that
will allow you automatically generate Java code from .proto files while you
build the application. How does it sound?

Cheers,
Tom

On Thu, Jan 8, 2009 at 2:50 AM, Emmanuel Lecharny <el...@gmail.com>
> wrote:
> > Hi guys,
> >
> > a couple of weeks ago, Thomasz Blachowicz submitted a patch including
> some
> > Google protocol buffer codec (see
> > https://issues.apache.org/jira/browse/DIRMINA-654).
> >
> > I just injected it into the sandbox, for us to review the code :
> > http://svn.apache.org/viewvc?rev=732497&view=rev
> >
> > If some of us can review the code and tell if it's ok, we then would be
> able
> > to inject it in the project (maybe after 2.0, depends).
> >
> > Or if you have any suggestion, or think that it's not worth being a part
> of
> > MINA, feel free to give some feedback.
>

Re: Google Protocol Buffer codec...

Posted by Ashish <pa...@gmail.com>.
Some observation
1. the sandbox folder has whole of MINA code. Will be great if we can
remove other files

AFAIK, protocol buffer codec is a binary data binding framework,
something similar to XML-Data binding framework's like JAXB, Castor,
XMLBeans. Google's implementation reminds of CORBA IDL's

There is basic operational similarity of Google Protocol Buffer codec
with prefixed string decoder. We have a length and then actual
messages. It will be worthwhile to explore if we can align the google
codec with prefixed string codec.

There is one more situation that must be though about is piggybacked
messages. the implementation can receive a packet with one message
ending and another message starting (again by nature of TCP). So
returning true from decoder may result in an exception (more data
remaining) [Please correct me if I am wrong]

Also, is it worthwhile to have a mechanism to add extensions into
ExtensionRegistry? or the library is doing it.

Would like to spend some more time with the code before commenting
more on this. Though it will be worthwhile to have this codec with us.
Once we review the code, lets vote for the same.

Shall try to run this implementation and validate it against my
possible use case. Though would have loved to have an implementation
where I don't have to define .proto files. (I know the implementation
doesn't work that ways)

ashish

On Thu, Jan 8, 2009 at 2:50 AM, Emmanuel Lecharny <el...@gmail.com> wrote:
> Hi guys,
>
> a couple of weeks ago, Thomasz Blachowicz submitted a patch including some
> Google protocol buffer codec (see
> https://issues.apache.org/jira/browse/DIRMINA-654).
>
> I just injected it into the sandbox, for us to review the code :
> http://svn.apache.org/viewvc?rev=732497&view=rev
>
> If some of us can review the code and tell if it's ok, we then would be able
> to inject it in the project (maybe after 2.0, depends).
>
> Or if you have any suggestion, or think that it's not worth being a part of
> MINA, feel free to give some feedback.