You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@avro.apache.org by cu...@apache.org on 2009/05/12 21:41:18 UTC
svn commit: r774047 - in /hadoop/avro/trunk/src: doc/content/xdocs/index.xml
java/overview.html
Author: cutting
Date: Tue May 12 19:41:18 2009
New Revision: 774047
URL: http://svn.apache.org/viewvc?rev=774047&view=rev
Log:
Minor doc updates. Copied javadoc overview to main doc.
Modified:
hadoop/avro/trunk/src/doc/content/xdocs/index.xml
hadoop/avro/trunk/src/java/overview.html
Modified: hadoop/avro/trunk/src/doc/content/xdocs/index.xml
URL: http://svn.apache.org/viewvc/hadoop/avro/trunk/src/doc/content/xdocs/index.xml?rev=774047&r1=774046&r2=774047&view=diff
==============================================================================
--- hadoop/avro/trunk/src/doc/content/xdocs/index.xml (original)
+++ hadoop/avro/trunk/src/doc/content/xdocs/index.xml Tue May 12 19:41:18 2009
@@ -1,20 +1,20 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
+ http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
--->
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ -->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
<document>
<header>
@@ -23,8 +23,65 @@
<body>
<section id="intro">
<title>Introduction</title>
- <p>Avro is...
- </p>
+ <p>Avro is a data serialization system.</p>
+ <p>Avro provides:</p>
+ <ul>
+ <li>Rich data structures.</li>
+ <li>A compact, fast, binary data format.</li>
+ <li>A container file, to store persistent data.</li>
+ <li>Remote procedure call (RPC).</li>
+ <li>Simple integration with dynamic languages. Code
+ generation is not required to read or write data files nor
+ to use or implement RPC protocols. Code generation as an
+ optional optimization, only worth implementing for
+ statically typed languages.</li>
+ </ul>
+ </section>
+ <section id="schemas">
+ <title>Schemas</title>
+ <p>Avro relies on <em>schemas</em>. When Avro data is read, the
+ schema used when writing it is always present. This permits
+ each datum to be written with no per-value overheads, making
+ serialization both fast and small. This also facilitates use
+ with dynamic, scripting languages, since data, together with
+ its schema, is fully self-describing.</p>
+ <p>When Avro data is stored in a file, its schema is stored with
+ it, so that files may be processed later by any program. If
+ the program reading the data expects a different schema this
+ can be easily resolved, since both schemas are present.</p>
+ <p>When Avro is used in RPC, the client and server exchange
+ schemas in the connection handshake. (This can be optimized
+ so that, for most calls, no schemas are actually transmitted.)
+ Since both client and server both have the other's full
+ schema, correspondence between same named fields, missing
+ fields, extra fields, etc. can all be easily resolved.</p>
+ <p>Avro schemas are defined with
+ with <a href="http://www.json.org/">JSON</a> . This
+ facilitates implementation in languages that already have
+ JSON libraries.</p>
+ </section>
+ <section id="compare">
+ <title>Comparison with other systems</title>
+ <p>Avro provides functionality similar to systems such
+ as <a href="http://incubator.apache.org/thrift/">Thrift</a>,
+ <a href="http://code.google.com/protobuf/">Protocol
+ Buffers</a>, etc. Avro differs from these systems in the
+ following fundamental aspects.</p>
+ <ul>
+ <li><em>Dynamic typing</em>: Avro does not require that code
+ be generated. Data is always accompanied by a schema that
+ permits full processing of that data without code
+ generation, static datatypes, etc. This facilitates
+ construction of generic data-processing systems and
+ languages.</li>
+ <li><em>Untagged data</em>: Since the schema is present when
+ data is read, considerably less type information need be
+ encoded with data, resulting in smaller serialization size.</li>
+ <li><em>No manually-assigned field IDs</em>: When a schema
+ changes, both the old and new schema are always present when
+ processing data, so differences may be resolved
+ symbolically, using field names.</li>
+ </ul>
</section>
</body>
</document>
Modified: hadoop/avro/trunk/src/java/overview.html
URL: http://svn.apache.org/viewvc/hadoop/avro/trunk/src/java/overview.html?rev=774047&r1=774046&r2=774047&view=diff
==============================================================================
--- hadoop/avro/trunk/src/java/overview.html (original)
+++ hadoop/avro/trunk/src/java/overview.html Tue May 12 19:41:18 2009
@@ -35,12 +35,12 @@
program reading the data expects a different schema this can be
easily resolved, since both schemas are present.
- <p>When Avro is used in {@link org.apache.avro.ipc RPC}, a client
- sends its request schema in the connection handshake. Similarly, a
- server delivers its response schema when a connection is
- established. Since both client and server both have the other's
- full schema, correspondence between same named fields, missing
- fields, extra fields, etc. can all be easily resolved.
+ <p>When Avro is used in {@link org.apache.avro.ipc RPC}, the client
+ and server exchange schemas in the connection handshake. (This
+ can be optimized so that, for most calls, no schemas are actually
+ transmitted.) Since both client and server both have the other's
+ full schema, correspondence between same named fields, missing
+ fields, extra fields, etc. can all be easily resolved.
<p>Avro schemas are defined with
with <a href="http://www.json.org/">JSON</a> . This facilitates
@@ -59,17 +59,14 @@
full processing of that data without code generation, static
datatypes, etc. This facilitates construction of generic
data-processing systems and languages.
+ <li><i>Untagged data</i>: Since the schema is present when data is
+ read, considerably less type information need be encoded with
+ data, resulting in smaller serialization size.</li>
<li><i>No manually-assigned field IDs</i>: When a schema changes,
both the old and new schema are always present when processing
- data, so that differences may be easily resolved.
- <li><i>Tiny core</i>: Adding full support for Avro to a new
- programming requires little code.
+ data, so differences may be resolved symbolically, using field
+ names.
</ul>
- <h2>Performance</h2>
-
- <p>As an anectdotal benchmark, Avro can read data into generic Java
- datastructures at over 60MB/s on my laptop.
-
</body>
</html>