You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@avro.apache.org by bl...@apache.org on 2016/09/04 20:43:22 UTC

avro git commit: AVRO-1704: Add single-record encoding spec. (Contributed by Niels Basjes)

Repository: avro
Updated Branches:
  refs/heads/master d7e123148 -> 30408a9c1


AVRO-1704: Add single-record encoding spec. (Contributed by Niels Basjes)


Project: http://git-wip-us.apache.org/repos/asf/avro/repo
Commit: http://git-wip-us.apache.org/repos/asf/avro/commit/30408a9c
Tree: http://git-wip-us.apache.org/repos/asf/avro/tree/30408a9c
Diff: http://git-wip-us.apache.org/repos/asf/avro/diff/30408a9c

Branch: refs/heads/master
Commit: 30408a9c192c5f4eaaf42f01f0ffbfffd705aa57
Parents: d7e1231
Author: Ryan Blue <bl...@apache.org>
Authored: Sun Jul 24 15:47:36 2016 -0700
Committer: Ryan Blue <bl...@apache.org>
Committed: Sun Sep 4 13:42:34 2016 -0700

----------------------------------------------------------------------
 CHANGES.txt                    |  2 ++
 doc/src/content/xdocs/spec.xml | 36 ++++++++++++++++++++++++++++++++----
 2 files changed, 34 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/avro/blob/30408a9c/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index 19f921b..3e329aa 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -8,6 +8,8 @@ Trunk (not yet released)
 
     AVRO-1704: Java: Add support for single-message encoding. (blue)
 
+    AVRO-1704: Spec: Add single-message encoding format. (Niels Basjes via blue)
+
   OPTIMIZATIONS
 
   IMPROVEMENTS

http://git-wip-us.apache.org/repos/asf/avro/blob/30408a9c/doc/src/content/xdocs/spec.xml
----------------------------------------------------------------------
diff --git a/doc/src/content/xdocs/spec.xml b/doc/src/content/xdocs/spec.xml
index ec1f199..917d314 100644
--- a/doc/src/content/xdocs/spec.xml
+++ b/doc/src/content/xdocs/spec.xml
@@ -487,18 +487,18 @@
               value, followed by that many key/value pairs.  A block
               with count zero indicates the end of the map.  Each item
               is encoded per the map's value schema.</p>
-	
+
             <p>If a block's count is negative, its absolute value is used,
               and the count is followed immediately by a <code>long</code>
               block <em>size</em> indicating the number of bytes in the
               block.  This block size permits fast skipping through data,
               e.g., when projecting a record to a subset of its fields.</p>
-	
+
             <p>The blocked representation permits one to read and write
               maps larger than can be buffered in memory, since one can
               start writing items without knowing the full length of the
               map.</p>
-	
+
 	  </section>
 
           <section id="union_encoding">
@@ -569,6 +569,34 @@
 
       </section>
 
+      <section id="single_object_encoding">
+        <title>Single-object encoding</title>
+
+        <p>In some situations a single Avro serialized object is to be stored for a
+        longer period of time. One very common example is storing Avro records
+        for several weeks in an <a href="http://kafka.apache.org/">Apache Kafka</a> topic.</p>
+        <p>In the period after a schema change this persistance system will contain records
+        that have been written with different schemas. So the need arises to know which schema
+        was used to write a record to support schema evolution correctly.
+        In most cases the schema itself is too large to include in the message,
+        so this binary wrapper format supports the use case more effectively.</p>
+
+        <section id="single_object_encoding_spec">
+          <title>Single object encoding specification</title>
+          <p>Single Avro objects are encoded as follows:</p>
+          <ol>
+            <li>A two-byte marker, <code>C3 01</code>, to show that the message is Avro and uses this single-record format (version 1).</li>
+            <li>The 8-byte little-endian CRC-64-AVRO <a href="#schema_fingerprints">fingerprint</a> of the object's schema</li>
+            <li>The Avro object encoded using <a href="#binary_encoding">Avro's binary encoding</a></li>
+          </ol>
+        </section>
+
+        <p>Implementations use the 2-byte marker to determine whether a payload is Avro.
+          This check helps avoid expensive lookups that resolve the schema from a
+          fingerprint, when the message is not an encoded Avro payload.</p>
+
+      </section>
+
     </section>
 
     <section id="order">
@@ -1237,7 +1265,7 @@
         </ul>
       </section>
 
-      <section>
+      <section id="schema_fingerprints">
         <title>Schema Fingerprints</title>
 
         <p>"[A] fingerprinting algorithm is a procedure that maps an