You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@avro.apache.org by bl...@apache.org on 2016/11/05 20:20:27 UTC

[10/35] avro git commit: AVRO-1704: Add single-record encoding spec. (Contributed by Niels Basjes)

AVRO-1704: Add single-record encoding spec. (Contributed by Niels Basjes)


Project: http://git-wip-us.apache.org/repos/asf/avro/repo
Commit: http://git-wip-us.apache.org/repos/asf/avro/commit/1c9ef72b
Tree: http://git-wip-us.apache.org/repos/asf/avro/tree/1c9ef72b
Diff: http://git-wip-us.apache.org/repos/asf/avro/diff/1c9ef72b

Branch: refs/heads/branch-1.8
Commit: 1c9ef72b4b7f3b34d16748c7161ec26b193e7299
Parents: b550367
Author: Ryan Blue <bl...@apache.org>
Authored: Sun Jul 24 15:47:36 2016 -0700
Committer: Ryan Blue <bl...@apache.org>
Committed: Sat Nov 5 13:15:08 2016 -0700

----------------------------------------------------------------------
 CHANGES.txt                    |  2 ++
 doc/src/content/xdocs/spec.xml | 36 ++++++++++++++++++++++++++++++++----
 2 files changed, 34 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/avro/blob/1c9ef72b/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index 0490e86..537b2b2 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -8,6 +8,8 @@ Trunk (not yet released)
 
     AVRO-1704: Java: Add support for single-message encoding. (blue)
 
+    AVRO-1704: Spec: Add single-message encoding format. (Niels Basjes via blue)
+
   OPTIMIZATIONS
 
   IMPROVEMENTS

http://git-wip-us.apache.org/repos/asf/avro/blob/1c9ef72b/doc/src/content/xdocs/spec.xml
----------------------------------------------------------------------
diff --git a/doc/src/content/xdocs/spec.xml b/doc/src/content/xdocs/spec.xml
index ec1f199..917d314 100644
--- a/doc/src/content/xdocs/spec.xml
+++ b/doc/src/content/xdocs/spec.xml
@@ -487,18 +487,18 @@
               value, followed by that many key/value pairs.  A block
               with count zero indicates the end of the map.  Each item
               is encoded per the map's value schema.</p>
-	
+
             <p>If a block's count is negative, its absolute value is used,
               and the count is followed immediately by a <code>long</code>
               block <em>size</em> indicating the number of bytes in the
               block.  This block size permits fast skipping through data,
               e.g., when projecting a record to a subset of its fields.</p>
-	
+
             <p>The blocked representation permits one to read and write
               maps larger than can be buffered in memory, since one can
               start writing items without knowing the full length of the
               map.</p>
-	
+
 	  </section>
 
           <section id="union_encoding">
@@ -569,6 +569,34 @@
 
       </section>
 
+      <section id="single_object_encoding">
+        <title>Single-object encoding</title>
+
+        <p>In some situations a single Avro serialized object is to be stored for a
+        longer period of time. One very common example is storing Avro records
+        for several weeks in an <a href="http://kafka.apache.org/">Apache Kafka</a> topic.</p>
+        <p>In the period after a schema change this persistance system will contain records
+        that have been written with different schemas. So the need arises to know which schema
+        was used to write a record to support schema evolution correctly.
+        In most cases the schema itself is too large to include in the message,
+        so this binary wrapper format supports the use case more effectively.</p>
+
+        <section id="single_object_encoding_spec">
+          <title>Single object encoding specification</title>
+          <p>Single Avro objects are encoded as follows:</p>
+          <ol>
+            <li>A two-byte marker, <code>C3 01</code>, to show that the message is Avro and uses this single-record format (version 1).</li>
+            <li>The 8-byte little-endian CRC-64-AVRO <a href="#schema_fingerprints">fingerprint</a> of the object's schema</li>
+            <li>The Avro object encoded using <a href="#binary_encoding">Avro's binary encoding</a></li>
+          </ol>
+        </section>
+
+        <p>Implementations use the 2-byte marker to determine whether a payload is Avro.
+          This check helps avoid expensive lookups that resolve the schema from a
+          fingerprint, when the message is not an encoded Avro payload.</p>
+
+      </section>
+
     </section>
 
     <section id="order">
@@ -1237,7 +1265,7 @@
         </ul>
       </section>
 
-      <section>
+      <section id="schema_fingerprints">
         <title>Schema Fingerprints</title>
 
         <p>"[A] fingerprinting algorithm is a procedure that maps an