You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Amichai Rothman (JIRA)" <ji...@apache.org> on 2010/03/03 17:26:27 UTC

[jira] Updated: (AVRO-438) spec organization and clarification improvements

     [ https://issues.apache.org/jira/browse/AVRO-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amichai Rothman updated AVRO-438:
---------------------------------

    Status: Patch Available  (was: Open)

The patch fixes most of the issues. A few more thoughts:

- The block mechanism described for arrays and maps is basically copy&paste of a few paragraphs - perhaps the map serialization can simply be described as an array where each item is a key immediately followed by its respective value?

- I added the binary encoding to the file format, but not the rpc section, since I got more confused there. In discussion of the HTTP transport, it says to use the "avro/binary" content type, which suggests there might also be a "avro/json" version later on or something like that. So maybe the serialization format is actually transport-dependent and not part of the spec? Maybe there should be another section for the binary socket transport implementation?

- Further, AIUI "avro/binary" is not a legal HTTP content type. It should be something more like "application/x-avro-binary"  (or registered with IANA). But this digresses into changes in the spec itself, not just its wording. Should I open this as a separate bug? is it?

- As for the example, yes it would be mostly binary, but can be annotated to explain what each bunch of bytes mean.


> spec organization and clarification improvements
> ------------------------------------------------
>
>                 Key: AVRO-438
>                 URL: https://issues.apache.org/jira/browse/AVRO-438
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>    Affects Versions: 1.3.0
>            Reporter: Amichai Rothman
>            Priority: Trivial
>         Attachments: fix_spec_loose_ends.patch
>
>
> There are a few improvements that can be made to make the spec better organized and clarify ambiguous meanings:
> 1. The binary encoding specifies string, then bytes, then longs. However, the first two are dependent on the latter, so in essence long encoding is being used before it was defined. In addition, string comes before bytes even though it is logically a special case of bytes. It would be clearer if these were ordered long, bytes, string so that each definition builds on its predecessors and nothing is used before it is defined. Maybe bytes/string should be at the end of the other primitives, since they are technically more complex structures. Note that it might be a good idea to do this in all places in the spec where primitives are enumerated.
> 2. The sentence about array count and size is a bit confusing. A possible alternative:
> "If a block's count is negative, its absolute value is used, and it is followed immediately by a long  block size indicating the number of bytes in the block. "
> and maybe this should be immediately followed by the sentence explaining why this is useful which is currently a few lines below.
> 3. There is a note about blocks being in experimental stage, but it's unclear if this is only for map blocks or also for array blocks.
> 4. Object Container Files and Protocol Declarations are described in the spec using JSON objects and their schema is shown, but it doesn't say anywhere how these should be serialized. If it's using binary serialization, it should say so explicitly. If it can be either binary or JSON, then the file has no self-describing way of differentiating the two - this should be addressed somewhere (maybe have a different magic word for binary/JSON content).
> 5. Protocol Definition has a namespace and name (called protocol), but it is not clear whether the namespace rules defined in the first section apply here or not. It should be mentioned explicitly either way.
> 6.It would be extremely helpful to have a full sample of an RPC call over HTTP, possibly using the HelloWorld protocol from the previous example. This would show how the transport, framing, handshake, call format and messages all fit together. Examples in RFCs often help clarify any misunderstandings that might arise from the body of the specs, which makes for a better spec - and this would be great here too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.