You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by Bruce Mitchener <br...@gmail.com> on 2010/04/08 00:13:34 UTC

Thoughts on an RPC protocol

I'm assuming that the goals of an optimized transport for Avro RPC are
something like the following:

 * Framing should be efficient, easy to implement.
 * Streaming of large values, both as part of a request and as a response is
very important.
 * Being able to have multiple concurrent requests in flight, while also
being able to have ordering guarantees where desired is necessary.
 * It should be easy to implement this in Java, C, Python, Ruby, etc.
 * Security is or will be important. This security can include authorization
as well as privacy concerns.

I'd like to see something based largely upon RFC 3080, with some
simplifications and extensions:

    http://www.faqs.org/rfcs/rfc3080.html

What does this get us?

 * This system has mechanisms in place for streaming both a single large
message and breaking a single reply up into multiple answers, allowing for
pretty flexible streaming.  (You can even mix these by having an answer that
gets chunked itself.)
 * Concurrency is achieved by having multiple channels. Each channel
executes messages in order, so you have a good mechanism for sending
multiple things at once as well as maintaining ordering guarantees as
necessary.
 * Reporting errors is very clear as it is a separate response type.
 * It has already been specified pretty clearly and we'd just be evolving
that to something that more closely matches our needs.
 * It specifies sufficient data that you could implement this over
transports other than TCP, such as UDP.

Changes, rough list:

 * Use Avro-encoding for most things, so the encoding of a message would
become an Avro struct.
 * Lose profiles in the sense that they're used in that specification since
we're just exchanging Avro RPCs.
 * Do length prefixing rather than in the header, so that it is very
amenable to binary I/O at high volumes.
 * No XML stuff, just existing things like the Avro handshake, wrapped up in
messages.
 * For now, don't worry about things like flow control as expressed in RFC
3081, mapping of 3080 to TCP.
 * Think about adding something for true one-way messages, but an empty
reply frame is probably sufficient, since that still allows reporting errors
if needed (or desired).
 * May well need some extensions for a more flexible security model.
 * Use Avro RPC stuff to encode the channel management commands on channel 0
rather than XML.

RFC 3117 (http://www.faqs.org/rfcs/rfc3117.html) goes into some of the
philosophy and thinking behind the design of RFC 3080.  Both are short and
easy reading.

 - Bruce

Re: Thoughts on an RPC protocol

Posted by Bruce Mitchener <br...@gmail.com>.

Hi Bo,

Thanks for your feedback!

On Thu, Apr 8, 2010 at 3:49 PM, Bo Shi <bs...@gmail.com> wrote:

> Hi Bruce,
>
> Would this RPC protocol take the role of the transport in the Avro
> specification or would it replace the protocol?  If the handshake
> occurs on channel 0 while the request/response payloads are
> transferred on a different channel, this would not meet the existing
> wire protocol as described in the current 1.3.2 spec right?
>

This would be a transport.

I was thinking they would happen on 0 with 0 being a control channel, but
I'm not married to that. Offhand, I don't see why that would violate
anything in the specification though?

A couple other questions inline:
>
> On Thu, Apr 8, 2010 at 11:54 AM, Bruce Mitchener
> <br...@gmail.com> wrote:
> > While I recommend actually reading RFC 3080 (it is an easy read), this
> > summary may help...
> >
> > Framing: Length prefixed data, nothing unusual.
> > Encoding: Messages are effectively this:
> >
> > enum message_type {
> >    message,            // a request
> >    reply,                  // when there's only a single reply
> >    answer,               // when there are multiple replies, send
> multiple
> > answers and then a null.
> >    null,                    // terminate a chain of replies
> >    error,                  // oops, there was an error
> > }
> >
> > struct message {
> >    enum message_type message_type;
> >    int channel;
> >    int message_id;
> >    bool more;          // Is this message complete, or is more data
> coming?
> > for streaming
> >    int sequence_number; // see RFC 3080
> >    optional int answer_number; // Used for answers
> >    bytes payload;   // The actual RPC command, still serialized here
> > }
> >
> > When a connection is opened, there's initially one channel, channel 0.
> That
> > channel is used for commands controlling the connection state, like
> opening
> > and closing channels.  We should also perform Avro RPC handshakes over
> > channel 0.
>
> Is channel 0 used exclusively as a control channel or would requests
> be allowed on this channel?  Any idea on what the control messages
> would look like?


Channel 0 would be a control channel in my original proposal. I can see
arguments for allowing sending other requests on it from a simplicity point
of view. Thoughts?

Control messages would look like an Avro RPC call. They would be things
like:

OpenChannel returning the opened channel (or you could pass it the channel
number?)
CloseChannel gets passed the channel number to close.

> Channels allow for concurrency.  You can send requests/messages down
> > multiple channels and process them independently. Messages on a single
> > channel need to be processed in order though. This allows for both
> > guaranteed order of execution (within a single channel) and greater
> > concurrency (multiple channels).
> >
> > Streaming happens in 2 ways.
>
> For streaming transfers, thoughts on optional compression codec
> attachment to streaming channels?  It may be useful for IO-bound
> applications, but if you're transferring files like avro object
> container files that are already compressed - you'd need some extra
> coordination (but maybe that's outside the problem domain).



Compression would probably be something useful to support, agreed. That
could happen in various ways:


   - Per channel (all data on the channel is compressed)
   - Per request via a header of some sort
   - Per connection (all data on all channels)


I suspect that I prefer per-channel from a simplicity point of view, but I'd
like to hear what people think.

 - Bruce

Re: Thoughts on an RPC protocol

Posted by Bo Shi <bs...@gmail.com>.

Hi Bruce,

Would this RPC protocol take the role of the transport in the Avro
specification or would it replace the protocol?  If the handshake
occurs on channel 0 while the request/response payloads are
transferred on a different channel, this would not meet the existing
wire protocol as described in the current 1.3.2 spec right?

A couple other questions inline:

On Thu, Apr 8, 2010 at 11:54 AM, Bruce Mitchener
<br...@gmail.com> wrote:
> While I recommend actually reading RFC 3080 (it is an easy read), this
> summary may help...
>
> Framing: Length prefixed data, nothing unusual.
> Encoding: Messages are effectively this:
>
> enum message_type {
>    message,            // a request
>    reply,                  // when there's only a single reply
>    answer,               // when there are multiple replies, send multiple
> answers and then a null.
>    null,                    // terminate a chain of replies
>    error,                  // oops, there was an error
> }
>
> struct message {
>    enum message_type message_type;
>    int channel;
>    int message_id;
>    bool more;          // Is this message complete, or is more data coming?
> for streaming
>    int sequence_number; // see RFC 3080
>    optional int answer_number; // Used for answers
>    bytes payload;   // The actual RPC command, still serialized here
> }
>
> When a connection is opened, there's initially one channel, channel 0. That
> channel is used for commands controlling the connection state, like opening
> and closing channels.  We should also perform Avro RPC handshakes over
> channel 0.

Is channel 0 used exclusively as a control channel or would requests
be allowed on this channel?  Any idea on what the control messages
would look like?

>
> Channels allow for concurrency.  You can send requests/messages down
> multiple channels and process them independently. Messages on a single
> channel need to be processed in order though. This allows for both
> guaranteed order of execution (within a single channel) and greater
> concurrency (multiple channels).
>
> Streaming happens in 2 ways.

For streaming transfers, thoughts on optional compression codec
attachment to streaming channels?  It may be useful for IO-bound
applications, but if you're transferring files like avro object
container files that are already compressed - you'd need some extra
coordination (but maybe that's outside the problem domain).

>
> The first way is to flip the more flag on a message. This means that the
> data has been broken up over multiple messages and you need to receive the
> whole thing before processing it.
>
> The second is to have multiple answers (followed by a null frame) to a
> single request message.  This allows you to process the data in a streaming
> fashion.  The only thing that this doesn't allow is to process the data
> being sent in a streaming fashion, but you could look at doing that by
> sending multiple request messages instead.
>
> Security and privacy can be handled by SASL.
>
> The RFC defines a number of ways in which you can detect buggy
> implementations of the protocol or invalid data being sent (framing /
> encoding violations).
>
> This should be pretty straight forward to implement, and as such (and since
> I need such a thing in the immediate future), I've already begun an
> implementation in C.
>
>  - Bruce
>
> On Wed, Apr 7, 2010 at 4:13 PM, Bruce Mitchener
> <br...@gmail.com>wrote:
>
>> I'm assuming that the goals of an optimized transport for Avro RPC are
>> something like the following:
>>
>>  * Framing should be efficient, easy to implement.
>>  * Streaming of large values, both as part of a request and as a response
>> is very important.
>>  * Being able to have multiple concurrent requests in flight, while also
>> being able to have ordering guarantees where desired is necessary.
>>  * It should be easy to implement this in Java, C, Python, Ruby, etc.
>>  * Security is or will be important. This security can include
>> authorization as well as privacy concerns.
>>
>> I'd like to see something based largely upon RFC 3080, with some
>> simplifications and extensions:
>>
>>     http://www.faqs.org/rfcs/rfc3080.html
>>
>> What does this get us?
>>
>>  * This system has mechanisms in place for streaming both a single large
>> message and breaking a single reply up into multiple answers, allowing for
>> pretty flexible streaming.  (You can even mix these by having an answer that
>> gets chunked itself.)
>>  * Concurrency is achieved by having multiple channels. Each channel
>> executes messages in order, so you have a good mechanism for sending
>> multiple things at once as well as maintaining ordering guarantees as
>> necessary.
>>  * Reporting errors is very clear as it is a separate response type.
>>  * It has already been specified pretty clearly and we'd just be evolving
>> that to something that more closely matches our needs.
>>  * It specifies sufficient data that you could implement this over
>> transports other than TCP, such as UDP.
>>
>> Changes, rough list:
>>
>>  * Use Avro-encoding for most things, so the encoding of a message would
>> become an Avro struct.
>>  * Lose profiles in the sense that they're used in that specification since
>> we're just exchanging Avro RPCs.
>>  * Do length prefixing rather than in the header, so that it is very
>> amenable to binary I/O at high volumes.
>>  * No XML stuff, just existing things like the Avro handshake, wrapped up
>> in messages.
>>  * For now, don't worry about things like flow control as expressed in RFC
>> 3081, mapping of 3080 to TCP.
>>  * Think about adding something for true one-way messages, but an empty
>> reply frame is probably sufficient, since that still allows reporting errors
>> if needed (or desired).
>>  * May well need some extensions for a more flexible security model.
>>  * Use Avro RPC stuff to encode the channel management commands on channel
>> 0 rather than XML.
>>
>> RFC 3117 (http://www.faqs.org/rfcs/rfc3117.html) goes into some of the
>> philosophy and thinking behind the design of RFC 3080.  Both are short and
>> easy reading.
>>
>>  - Bruce
>>
>>
>