You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@avro.apache.org by cu...@apache.org on 2009/05/12 21:41:18 UTC

svn commit: r774047 - in /hadoop/avro/trunk/src: doc/content/xdocs/index.xml java/overview.html

Author: cutting
Date: Tue May 12 19:41:18 2009
New Revision: 774047

URL: http://svn.apache.org/viewvc?rev=774047&view=rev
Log:
Minor doc updates.  Copied javadoc overview to main doc.

Modified:
    hadoop/avro/trunk/src/doc/content/xdocs/index.xml
    hadoop/avro/trunk/src/java/overview.html

Modified: hadoop/avro/trunk/src/doc/content/xdocs/index.xml
URL: http://svn.apache.org/viewvc/hadoop/avro/trunk/src/doc/content/xdocs/index.xml?rev=774047&r1=774046&r2=774047&view=diff
==============================================================================
--- hadoop/avro/trunk/src/doc/content/xdocs/index.xml (original)
+++ hadoop/avro/trunk/src/doc/content/xdocs/index.xml Tue May 12 19:41:18 2009
@@ -1,20 +1,20 @@
 <?xml version="1.0" encoding="UTF-8"?>
 <!--
-  Licensed to the Apache Software Foundation (ASF) under one or more
-  contributor license agreements.  See the NOTICE file distributed with
-  this work for additional information regarding copyright ownership.
-  The ASF licenses this file to You under the Apache License, Version 2.0
-  (the "License"); you may not use this file except in compliance with
-  the License.  You may obtain a copy of the License at
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
 
-      http://www.apache.org/licenses/LICENSE-2.0
+   http://www.apache.org/licenses/LICENSE-2.0
 
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
--->
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+  -->
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
 <document>
   <header>
@@ -23,8 +23,65 @@
   <body>
     <section id="intro">
       <title>Introduction</title>
-      <p>Avro is...
-      </p>
+      <p>Avro is a data serialization system.</p>
+      <p>Avro provides:</p>
+	<ul>
+	  <li>Rich data structures.</li>
+	  <li>A compact, fast, binary data format.</li>
+	  <li>A container file, to store persistent data.</li>
+	  <li>Remote procedure call (RPC).</li>
+	  <li>Simple integration with dynamic languages.  Code
+	    generation is not required to read or write data files nor
+	    to use or implement RPC protocols.  Code generation as an
+	    optional optimization, only worth implementing for
+	    statically typed languages.</li>
+	</ul>
+    </section>
+    <section id="schemas">
+      <title>Schemas</title>
+      <p>Avro relies on <em>schemas</em>.  When Avro data is read, the
+	schema used when writing it is always present.  This permits
+	each datum to be written with no per-value overheads, making
+	serialization both fast and small.  This also facilitates use
+	with dynamic, scripting languages, since data, together with
+	its schema, is fully self-describing.</p>
+      <p>When Avro data is stored in a file, its schema is stored with
+	it, so that files may be processed later by any program.  If
+	the program reading the data expects a different schema this
+	can be easily resolved, since both schemas are present.</p>
+      <p>When Avro is used in RPC, the client and server exchange
+	schemas in the connection handshake.  (This can be optimized
+	so that, for most calls, no schemas are actually transmitted.)
+	Since both client and server both have the other's full
+	schema, correspondence between same named fields, missing
+	fields, extra fields, etc. can all be easily resolved.</p>
+      <p>Avro schemas are defined with
+	with <a href="http://www.json.org/">JSON</a> .  This
+	facilitates implementation in languages that already have
+	JSON libraries.</p>
+    </section>
+    <section id="compare">
+      <title>Comparison with other systems</title>
+      <p>Avro provides functionality similar to systems such
+	as <a href="http://incubator.apache.org/thrift/">Thrift</a>,
+	<a href="http://code.google.com/protobuf/">Protocol
+	  Buffers</a>, etc.  Avro differs from these systems in the
+	  following fundamental aspects.</p>
+      <ul>
+	<li><em>Dynamic typing</em>: Avro does not require that code
+	  be generated.  Data is always accompanied by a schema that
+	  permits full processing of that data without code
+	  generation, static datatypes, etc.  This facilitates
+	  construction of generic data-processing systems and
+	  languages.</li>
+	<li><em>Untagged data</em>: Since the schema is present when
+	  data is read, considerably less type information need be
+	  encoded with data, resulting in smaller serialization size.</li>
+	<li><em>No manually-assigned field IDs</em>: When a schema
+	  changes, both the old and new schema are always present when
+	  processing data, so differences may be resolved
+	  symbolically, using field names.</li>
+      </ul>
     </section>
   </body>
 </document>

Modified: hadoop/avro/trunk/src/java/overview.html
URL: http://svn.apache.org/viewvc/hadoop/avro/trunk/src/java/overview.html?rev=774047&r1=774046&r2=774047&view=diff
==============================================================================
--- hadoop/avro/trunk/src/java/overview.html (original)
+++ hadoop/avro/trunk/src/java/overview.html Tue May 12 19:41:18 2009
@@ -35,12 +35,12 @@
   program reading the data expects a different schema this can be
   easily resolved, since both schemas are present.
 
-  <p>When Avro is used in {@link org.apache.avro.ipc RPC}, a client
-  sends its request schema in the connection handshake.  Similarly, a
-  server delivers its response schema when a connection is
-  established.  Since both client and server both have the other's
-  full schema, correspondence between same named fields, missing
-  fields, extra fields, etc. can all be easily resolved.
+  <p>When Avro is used in {@link org.apache.avro.ipc RPC}, the client
+    and server exchange schemas in the connection handshake.  (This
+    can be optimized so that, for most calls, no schemas are actually
+    transmitted.)  Since both client and server both have the other's
+    full schema, correspondence between same named fields, missing
+    fields, extra fields, etc. can all be easily resolved.
 
   <p>Avro schemas are defined with
   with <a href="http://www.json.org/">JSON</a> .  This facilitates
@@ -59,17 +59,14 @@
     full processing of that data without code generation, static
     datatypes, etc.  This facilitates construction of generic
     data-processing systems and languages.
+    <li><i>Untagged data</i>: Since the schema is present when data is
+    read, considerably less type information need be encoded with
+    data, resulting in smaller serialization size.</li>
     <li><i>No manually-assigned field IDs</i>: When a schema changes,
     both the old and new schema are always present when processing
-    data, so that differences may be easily resolved.
-    <li><i>Tiny core</i>: Adding full support for Avro to a new
-    programming requires little code.
+    data, so differences may be resolved symbolically, using field
+    names.
   </ul>  
 
-  <h2>Performance</h2>
-
-  <p>As an anectdotal benchmark, Avro can read data into generic Java
-  datastructures at over 60MB/s on my laptop.
-
 </body>
 </html>