You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by john malkovich <ck...@gmail.com> on 2009/12/08 10:11:11 UTC

handshake spec

hello everyone,
thank you for such a wonderful project.
ufortunately there is no erlang implementation of avro so I have taken the
liberty to attempt such a task. as soon as I get something working I'll put
up the code, and if someone else is working on the same thing please let me
know - Im more than open to collaboration since my goal is to get and use a
working erlang avro lib.
Im reading the python implementation in details, as well as the 1.3
specification. the spec unfortunately is not clear (to me) in some parts so
I would like to ask the questions here and hopefully someone can provide
some clues/answers.

handshake request
its mentioned that a hash of the json protocol schema is sent on each
request to the server

{
  "type": "record",
  "name": "HandshakeRequest", "namespace":"org.apache.avro.ipc",
  "fields": [
    {"name": "clientHash",
     "type": {"type": "fixed", "name": "MD5", "size": 16}},
    {"name": "clientProtocol", "type": ["null", "string"]},
    {"name": "serverHash", "type": "MD5"},
    {"name": "meta", "type": ["null", {"type": "map", "values": "bytes"}]}
  ]
}

so the question is:
- both "clientHash" and "serverHash" should be replaced with the
actual hash of the protocol json definitions?
- what is the "server protocol"? if client and server are compatible
dont they both use the same protocol definition?
- the "type": ["null", "string"] syntax means that "type" key has
either "null" or "string" value?

Re: handshake spec

Posted by Doug Cutting <cu...@apache.org>.
john malkovich wrote:
> so how is the schema compatibility handled now in Java? Im assuming that
> Java is the most compliant implementation. Otherwise you would have to count
> on runtime exceptions to pick up incompatibilities, which is what I think I
> saw in python implementation (I could be wrong). what am I missing?

Currently we do not check such compatibility at handshake-time, but 
rather only at runtime.  This permits, e.g., a client's protocol to 
contain a message that the server does not implement.  So long as the 
client does not in fact send that message all's well.

We could perhaps tighten things so that, requiring that the message 
names in a client's protocol are a strict subset of those on the server. 
  But I fear this would make protocol evolution more difficult in some 
cases.  We expect a protocol to evolve primarily by adding new messages 
(or adding parameters to existing messages).  If an old client is 
talking to a new server, then the subset rule would work fine.  But if a 
new client is talking to an old server it would not.  In many cases this 
is perhaps appropriate.  However a client might be able to detect that 
it is talking to an old server (via, e.g., the absence of a new field in 
a prior response, or even a getVersion() message) and only to send old 
messages to old servers.  We don't want to prohibit such things.

The goal is not to automatically solve all protocol evolution problems, 
but rather to provide a means where varying versions can continue to 
interact with predictable outcomes, a platform that permits building 
systems that evolve.  My belief is that before someone can be confident 
that a new service works with old clients and that a new client works 
with old services, they need to run exhaustive tests between these. 
Attempting to determine this statically will limit functionality without 
providing real confidence.

Does this seem reasonable?  Or do others feel that more static checking 
at handshake time would be beneficial?

Doug

Re: handshake spec

Posted by john malkovich <ck...@gmail.com>.
Doug thanks for sharing your knowledge
its appreciated


On Tue, Dec 8, 2009 at 9:14 AM, Doug Cutting <cu...@apache.org> wrote:

> john malkovich wrote:
>
>> so the question is:
>> - both "clientHash" and "serverHash" should be replaced with the
>> actual hash of the protocol json definitions?
>>
>
> Yes.  Note that the hashing is always done by the owner of the protocol
> definition, to make it immune from whitespace differences, etc.  So the
> client and server each compute the hash of their own protocol and use the
> hash of the other's as an opaque cache key.
>
>
>  - what is the "server protocol"? if client and server are compatible
>> dont they both use the same protocol definition?
>>
>
> They might have different versions of a protocol.  Messages may have been
> added or removed.  In general, the only compatibility that's needed is that:
>  1. the server's schema has messages whose names match those of the
> client's actual requests;
>  2. the server's schema has default values for request parameters that the
> client does not provide; and
>  3. the client's schema has default values for fields of responses that the
> server does not provide.
>
> Only (1) is presently a hard requirement.  For (2) and (3) implementations
> may fail if they treat missing fields without default values as required, or
> might be able to proceed if they can handle these as unset.  There's been
> some discussion about mandating that such values are required.  But even if
> that were done then I'm not sure we'd check it at the time of the handshake,
> since the client may never send the offending messages.

so how is the schema compatibility handled now in Java? Im assuming that
Java is the most compliant implementation. Otherwise you would have to count
on runtime exceptions to pick up incompatibilities, which is what I think I
saw in python implementation (I could be wrong). what am I missing?

doing it on handshake sounds like a sure way to avoid runtime exceptions (if
they even should be avoided in this case), but yes it does add that constant
overhead even in cases where the conflitc never happens.

>
>
>  - the "type": ["null", "string"] syntax means that "type" key has
>> either "null" or "string" value?
>>
>
> Yes.  JSON arrays are used to signify union types.
>
> Doug
>

thanks

Re: handshake spec

Posted by Doug Cutting <cu...@apache.org>.
john malkovich wrote:
> so the question is:
> - both "clientHash" and "serverHash" should be replaced with the
> actual hash of the protocol json definitions?

Yes.  Note that the hashing is always done by the owner of the protocol 
definition, to make it immune from whitespace differences, etc.  So the 
client and server each compute the hash of their own protocol and use 
the hash of the other's as an opaque cache key.

> - what is the "server protocol"? if client and server are compatible
> dont they both use the same protocol definition?

They might have different versions of a protocol.  Messages may have 
been added or removed.  In general, the only compatibility that's needed 
is that:
  1. the server's schema has messages whose names match those of the 
client's actual requests;
  2. the server's schema has default values for request parameters that 
the client does not provide; and
  3. the client's schema has default values for fields of responses that 
the server does not provide.

Only (1) is presently a hard requirement.  For (2) and (3) 
implementations may fail if they treat missing fields without default 
values as required, or might be able to proceed if they can handle these 
as unset.  There's been some discussion about mandating that such values 
are required.  But even if that were done then I'm not sure we'd check 
it at the time of the handshake, since the client may never send the 
offending messages.

> - the "type": ["null", "string"] syntax means that "type" key has
> either "null" or "string" value?

Yes.  JSON arrays are used to signify union types.

Doug

Re: handshake spec

Posted by john malkovich <ck...@gmail.com>.
great,
I've got some code too but its pretty disconnected, as I was reading the
spec I was copying the python implementation in erlang and then simplifying
where obvious... Im looking over ur stuff.
Im pretty new to git but I do use it day to day, what do you think would be
the best way for me to add comments/documentation to your code for starters?
then as I familiarize myself I can add details where its missing

thanks

On Tue, Dec 8, 2009 at 6:07 PM, Todd Lipcon <to...@cloudera.com> wrote:

> On Tue, Dec 8, 2009 at 8:54 AM, john malkovich <ck...@gmail.com> wrote:
>
> > thats great! :)
> > I had a feeling something was up, that twitter digg hackathon post where
> > someone was hoping that it an erlang port was started?
> > either way pls put it up. Im no expert and the 1.2 spec is still a bit
> > unclear to me but I definitelly got more than a few things understood so
> > hopefully I'll be able to pitch in work wise
> >
> >
> Hi John,
>
> My branch is up here:
> http://github.com/toddlipcon/avro/tree/erl
>
> It's nowhere near complete, woefully undercommented, and has at least one
> big refactor before it's a good design :) But, then again, isn't that true
> of most software projects? ;-)
>
> But, it may at least get you started!
>
> -Todd
>
>
>
> > thanks
> >
> > On Tue, Dec 8, 2009 at 8:47 AM, Todd Lipcon <to...@cloudera.com> wrote:
> >
> > > Hi John,
> > >
> > > Before you go too far with Erlang -- I have an implementation that's
> > maybe
> > > half done that I started at the recent hackathon. I'll try to push this
> > to
> > > a
> > > public repository so you can continue from there rather than starting
> > fresh
> > > if you like.
> > >
> > > Thanks
> > > -Todd
> > >
> > > On Tue, Dec 8, 2009 at 1:11 AM, john malkovich <ck...@gmail.com>
> > wrote:
> > >
> > > > hello everyone,
> > > > thank you for such a wonderful project.
> > > > ufortunately there is no erlang implementation of avro so I have
> taken
> > > the
> > > > liberty to attempt such a task. as soon as I get something working
> I'll
> > > put
> > > > up the code, and if someone else is working on the same thing please
> > let
> > > me
> > > > know - Im more than open to collaboration since my goal is to get and
> > use
> > > a
> > > > working erlang avro lib.
> > > > Im reading the python implementation in details, as well as the 1.3
> > > > specification. the spec unfortunately is not clear (to me) in some
> > parts
> > > so
> > > > I would like to ask the questions here and hopefully someone can
> > provide
> > > > some clues/answers.
> > > >
> > > > handshake request
> > > > its mentioned that a hash of the json protocol schema is sent on each
> > > > request to the server
> > > >
> > > > {
> > > >  "type": "record",
> > > >  "name": "HandshakeRequest", "namespace":"org.apache.avro.ipc",
> > > >  "fields": [
> > > >    {"name": "clientHash",
> > > >     "type": {"type": "fixed", "name": "MD5", "size": 16}},
> > > >    {"name": "clientProtocol", "type": ["null", "string"]},
> > > >    {"name": "serverHash", "type": "MD5"},
> > > >    {"name": "meta", "type": ["null", {"type": "map", "values":
> > "bytes"}]}
> > > >  ]
> > > > }
> > > >
> > > > so the question is:
> > > > - both "clientHash" and "serverHash" should be replaced with the
> > > > actual hash of the protocol json definitions?
> > > > - what is the "server protocol"? if client and server are compatible
> > > > dont they both use the same protocol definition?
> > > > - the "type": ["null", "string"] syntax means that "type" key has
> > > > either "null" or "string" value?
> > > >
> > >
> >
>

Re: handshake spec

Posted by Todd Lipcon <to...@cloudera.com>.
On Tue, Dec 8, 2009 at 8:54 AM, john malkovich <ck...@gmail.com> wrote:

> thats great! :)
> I had a feeling something was up, that twitter digg hackathon post where
> someone was hoping that it an erlang port was started?
> either way pls put it up. Im no expert and the 1.2 spec is still a bit
> unclear to me but I definitelly got more than a few things understood so
> hopefully I'll be able to pitch in work wise
>
>
Hi John,

My branch is up here:
http://github.com/toddlipcon/avro/tree/erl

It's nowhere near complete, woefully undercommented, and has at least one
big refactor before it's a good design :) But, then again, isn't that true
of most software projects? ;-)

But, it may at least get you started!

-Todd



> thanks
>
> On Tue, Dec 8, 2009 at 8:47 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
> > Hi John,
> >
> > Before you go too far with Erlang -- I have an implementation that's
> maybe
> > half done that I started at the recent hackathon. I'll try to push this
> to
> > a
> > public repository so you can continue from there rather than starting
> fresh
> > if you like.
> >
> > Thanks
> > -Todd
> >
> > On Tue, Dec 8, 2009 at 1:11 AM, john malkovich <ck...@gmail.com>
> wrote:
> >
> > > hello everyone,
> > > thank you for such a wonderful project.
> > > ufortunately there is no erlang implementation of avro so I have taken
> > the
> > > liberty to attempt such a task. as soon as I get something working I'll
> > put
> > > up the code, and if someone else is working on the same thing please
> let
> > me
> > > know - Im more than open to collaboration since my goal is to get and
> use
> > a
> > > working erlang avro lib.
> > > Im reading the python implementation in details, as well as the 1.3
> > > specification. the spec unfortunately is not clear (to me) in some
> parts
> > so
> > > I would like to ask the questions here and hopefully someone can
> provide
> > > some clues/answers.
> > >
> > > handshake request
> > > its mentioned that a hash of the json protocol schema is sent on each
> > > request to the server
> > >
> > > {
> > >  "type": "record",
> > >  "name": "HandshakeRequest", "namespace":"org.apache.avro.ipc",
> > >  "fields": [
> > >    {"name": "clientHash",
> > >     "type": {"type": "fixed", "name": "MD5", "size": 16}},
> > >    {"name": "clientProtocol", "type": ["null", "string"]},
> > >    {"name": "serverHash", "type": "MD5"},
> > >    {"name": "meta", "type": ["null", {"type": "map", "values":
> "bytes"}]}
> > >  ]
> > > }
> > >
> > > so the question is:
> > > - both "clientHash" and "serverHash" should be replaced with the
> > > actual hash of the protocol json definitions?
> > > - what is the "server protocol"? if client and server are compatible
> > > dont they both use the same protocol definition?
> > > - the "type": ["null", "string"] syntax means that "type" key has
> > > either "null" or "string" value?
> > >
> >
>

Re: handshake spec

Posted by john malkovich <ck...@gmail.com>.
thats great! :)
I had a feeling something was up, that twitter digg hackathon post where
someone was hoping that it an erlang port was started?
either way pls put it up. Im no expert and the 1.2 spec is still a bit
unclear to me but I definitelly got more than a few things understood so
hopefully I'll be able to pitch in work wise

thanks

On Tue, Dec 8, 2009 at 8:47 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi John,
>
> Before you go too far with Erlang -- I have an implementation that's maybe
> half done that I started at the recent hackathon. I'll try to push this to
> a
> public repository so you can continue from there rather than starting fresh
> if you like.
>
> Thanks
> -Todd
>
> On Tue, Dec 8, 2009 at 1:11 AM, john malkovich <ck...@gmail.com> wrote:
>
> > hello everyone,
> > thank you for such a wonderful project.
> > ufortunately there is no erlang implementation of avro so I have taken
> the
> > liberty to attempt such a task. as soon as I get something working I'll
> put
> > up the code, and if someone else is working on the same thing please let
> me
> > know - Im more than open to collaboration since my goal is to get and use
> a
> > working erlang avro lib.
> > Im reading the python implementation in details, as well as the 1.3
> > specification. the spec unfortunately is not clear (to me) in some parts
> so
> > I would like to ask the questions here and hopefully someone can provide
> > some clues/answers.
> >
> > handshake request
> > its mentioned that a hash of the json protocol schema is sent on each
> > request to the server
> >
> > {
> >  "type": "record",
> >  "name": "HandshakeRequest", "namespace":"org.apache.avro.ipc",
> >  "fields": [
> >    {"name": "clientHash",
> >     "type": {"type": "fixed", "name": "MD5", "size": 16}},
> >    {"name": "clientProtocol", "type": ["null", "string"]},
> >    {"name": "serverHash", "type": "MD5"},
> >    {"name": "meta", "type": ["null", {"type": "map", "values": "bytes"}]}
> >  ]
> > }
> >
> > so the question is:
> > - both "clientHash" and "serverHash" should be replaced with the
> > actual hash of the protocol json definitions?
> > - what is the "server protocol"? if client and server are compatible
> > dont they both use the same protocol definition?
> > - the "type": ["null", "string"] syntax means that "type" key has
> > either "null" or "string" value?
> >
>

Re: handshake spec

Posted by Todd Lipcon <to...@cloudera.com>.
Hi John,

Before you go too far with Erlang -- I have an implementation that's maybe
half done that I started at the recent hackathon. I'll try to push this to a
public repository so you can continue from there rather than starting fresh
if you like.

Thanks
-Todd

On Tue, Dec 8, 2009 at 1:11 AM, john malkovich <ck...@gmail.com> wrote:

> hello everyone,
> thank you for such a wonderful project.
> ufortunately there is no erlang implementation of avro so I have taken the
> liberty to attempt such a task. as soon as I get something working I'll put
> up the code, and if someone else is working on the same thing please let me
> know - Im more than open to collaboration since my goal is to get and use a
> working erlang avro lib.
> Im reading the python implementation in details, as well as the 1.3
> specification. the spec unfortunately is not clear (to me) in some parts so
> I would like to ask the questions here and hopefully someone can provide
> some clues/answers.
>
> handshake request
> its mentioned that a hash of the json protocol schema is sent on each
> request to the server
>
> {
>  "type": "record",
>  "name": "HandshakeRequest", "namespace":"org.apache.avro.ipc",
>  "fields": [
>    {"name": "clientHash",
>     "type": {"type": "fixed", "name": "MD5", "size": 16}},
>    {"name": "clientProtocol", "type": ["null", "string"]},
>    {"name": "serverHash", "type": "MD5"},
>    {"name": "meta", "type": ["null", {"type": "map", "values": "bytes"}]}
>  ]
> }
>
> so the question is:
> - both "clientHash" and "serverHash" should be replaced with the
> actual hash of the protocol json definitions?
> - what is the "server protocol"? if client and server are compatible
> dont they both use the same protocol definition?
> - the "type": ["null", "string"] syntax means that "type" key has
> either "null" or "string" value?
>