You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@avro.apache.org by fo...@apache.org on 2019/07/04 09:36:48 UTC

[avro] branch master updated: AVRO-2441: Add quickstart guide for Python3 (#558)

This is an automated email from the ASF dual-hosted git repository.

fokko pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/avro.git


The following commit(s) were added to refs/heads/master by this push:
     new 9beb79d  AVRO-2441: Add quickstart guide for Python3 (#558)
9beb79d is described below

commit 9beb79def587e4ee7056a91cc3e21d6392b0ff94
Author: Kengo Seki <se...@apache.org>
AuthorDate: Thu Jul 4 18:36:43 2019 +0900

    AVRO-2441: Add quickstart guide for Python3 (#558)
    
    * AVRO-2441: Add quickstart guide for Python3
    
    * AVRO-2441: Add quickstart guide for Python3
    
    Add a description about installation with pip
    
    * AVRO-2441: Add quickstart guide for Python3
    
    Remove an unnecessary comment
    
    * AVRO-2441: Add quickstart guide for Python3
    
    Make the installation commands consistent to install
    Avro onto the current user's local directory.
---
 doc/src/content/xdocs/gettingstartedpython3.xml | 235 ++++++++++++++++++++++++
 doc/src/content/xdocs/site.xml                  |   3 +-
 2 files changed, 237 insertions(+), 1 deletion(-)

diff --git a/doc/src/content/xdocs/gettingstartedpython3.xml b/doc/src/content/xdocs/gettingstartedpython3.xml
new file mode 100644
index 0000000..46ededa
--- /dev/null
+++ b/doc/src/content/xdocs/gettingstartedpython3.xml
@@ -0,0 +1,235 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+   https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+  -->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN"
+   "https://forrest.apache.org/dtd/document-v20.dtd" [
+  <!ENTITY % avro-entities PUBLIC "-//Apache//ENTITIES Avro//EN"
+	   "../../../../build/avro.ent">
+  %avro-entities;
+]>
+<document>
+  <header>
+    <title>Apache Avro&#153; &AvroVersion; Getting Started (Python3)</title>
+  </header>
+  <body>
+    <p>
+      This is a short guide for getting started with Apache Avro&#153; using
+      Python3.  This guide only covers using Avro for data serialization; see
+      Patrick Hunt's <a href="https://github.com/phunt/avro-rpc-quickstart">Avro
+      RPC Quick Start</a> for a good introduction to using Avro for RPC.
+    </p>
+
+    <section id="download_install">
+      <title>Download</title>
+      <p>
+        Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby can be
+        downloaded from the <a
+        href="https://avro.apache.org/releases.html">Apache Avro&#153;
+        Releases</a> page.  This guide uses Avro &AvroVersion;, the latest
+        version at the time of writing.  Download and unzip
+        <em>avro-python3-&AvroVersion;.tar.gz</em>, and install via <code>python3
+        setup.py</code>.  Ensure that you can <code>import avro</code> from a
+        Python prompt.
+      </p>
+      <source>
+$ tar xvf avro-python3-&AvroVersion;.tar.gz
+$ cd avro-python3-&AvroVersion;
+$ python3 setup.py install --user
+$ python3
+>>> import avro # should not raise ImportError
+      </source>
+      <p>
+        Or, it's published as <a href="https://pypi.org/project/avro-python3/">
+        the avro-python3 package</a> on <a href="https://pypi.org/">PyPI</a>,
+        so you can also use pip for installation:
+      </p>
+      <source>
+$ pip3 install avro-python3
+$ python3
+>>> import avro # should not raise ImportError
+      </source>
+      <p>
+        Alternatively, you may build the Avro Python library from source.  From
+        your the root Avro directory, run the commands
+      </p>
+      <source>
+$ cd lang/py3/
+$ python3 setup.py install --user
+$ python3
+>>> import avro # should not raise ImportError
+      </source>
+    </section>
+
+    <section>
+      <title>Defining a schema</title>
+      <p>
+        Avro schemas are defined using JSON.  Schemas are composed of <a
+        href="spec.html#schema_primitive">primitive types</a>
+        (<code>null</code>, <code>boolean</code>, <code>int</code>,
+        <code>long</code>, <code>float</code>, <code>double</code>,
+        <code>bytes</code>, and <code>string</code>) and <a
+        href="spec.html#schema_complex">complex types</a> (<code>record</code>,
+        <code>enum</code>, <code>array</code>, <code>map</code>,
+        <code>union</code>, and <code>fixed</code>).  You can learn more about
+        Avro schemas and types from the specification, but for now let's start
+        with a simple schema example, <em>user.avsc</em>:
+      </p>
+      <source>
+{"namespace": "example.avro",
+ "type": "record",
+ "name": "User",
+ "fields": [
+     {"name": "name", "type": "string"},
+     {"name": "favorite_number",  "type": ["int", "null"]},
+     {"name": "favorite_color", "type": ["string", "null"]}
+ ]
+}
+      </source>
+      <p>
+        This schema defines a record representing a hypothetical user.  (Note
+        that a schema file can only contain a single schema definition.)  At
+        minimum, a record definition must include its type (<code>"type":
+        "record"</code>), a name (<code>"name": "User"</code>), and fields, in
+        this case <code>name</code>, <code>favorite_number</code>, and
+        <code>favorite_color</code>.  We also define a namespace
+        (<code>"namespace": "example.avro"</code>), which together with the name
+        attribute defines the "full name" of the schema
+        (<code>example.avro.User</code> in this case).
+
+      </p>
+      <p>
+        Fields are defined via an array of objects, each of which defines a name
+        and type (other attributes are optional, see the <a
+        href="spec.html#schema_record">record specification</a> for more
+        details).  The type attribute of a field is another schema object, which
+        can be either a primitive or complex type.  For example, the
+        <code>name</code> field of our User schema is the primitive type
+        <code>string</code>, whereas the <code>favorite_number</code> and
+        <code>favorite_color</code> fields are both <code>union</code>s,
+        represented by JSON arrays.  <code>union</code>s are a complex type that
+        can be any of the types listed in the array; e.g.,
+        <code>favorite_number</code> can either be an <code>int</code> or
+        <code>null</code>, essentially making it an optional field.
+      </p>
+    </section>
+
+    <section>
+      <title>Serializing and deserializing without code generation</title>
+      <p>
+        Data in Avro is always stored with its corresponding schema, meaning we
+        can always read a serialized item, regardless of whether we know the
+        schema ahead of time.  This allows us to perform serialization and
+        deserialization without code generation.  Note that the Avro Python
+        library does not support code generation.
+      </p>
+      <p>
+        Try running the following code snippet, which serializes two users to a
+        data file on disk, and then reads back and deserializes the data file:
+      </p>
+      <source>
+import avro.schema
+from avro.datafile import DataFileReader, DataFileWriter
+from avro.io import DatumReader, DatumWriter
+
+schema = avro.schema.Parse(open("user.avsc", "rb").read())
+
+writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema)
+writer.append({"name": "Alyssa", "favorite_number": 256})
+writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
+writer.close()
+
+reader = DataFileReader(open("users.avro", "rb"), DatumReader())
+for user in reader:
+    print(user)
+reader.close()
+      </source>
+      <p>This outputs:</p>
+      <source>
+{u'favorite_color': None, u'favorite_number': 256, u'name': u'Alyssa'}
+{u'favorite_color': u'red', u'favorite_number': 7, u'name': u'Ben'}
+      </source>
+      <p>
+        Do make sure that you open your files in binary mode (i.e. using the modes
+        <code>wb</code> or <code>rb</code> respectively). Otherwise you might
+        generate corrupt files due to
+        <a href="https://docs.python.org/library/functions.html#open">
+        automatic replacement</a> of newline characters with the
+        platform-specific representations.
+      </p>
+      <p>
+        Let's take a closer look at what's going on here.
+      </p>
+      <source>
+schema = avro.schema.Parse(open("user.avsc", "rb").read())
+      </source>
+      <p>
+        <code>avro.schema.Parse</code> takes a string containing a JSON schema
+        definition as input and outputs a <code>avro.schema.Schema</code> object
+        (specifically a subclass of <code>Schema</code>, in this case
+        <code>RecordSchema</code>).  We're passing in the contents of our
+        user.avsc schema file here.
+      </p>
+      <source>
+writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema)
+      </source>
+      <p>
+        We create a <code>DataFileWriter</code>, which we'll use to write
+        serialized items to a data file on disk.  The
+        <code>DataFileWriter</code> constructor takes three arguments:
+      </p>
+        <ul>
+          <li>The file we'll serialize to</li>
+          <li>A <code>DatumWriter</code>, which is responsible for actually
+          serializing the items to Avro's binary format.</li>
+          <li>The schema we're using.  The <code>DataFileWriter</code> needs the
+          schema both to write the schema to the data file, and to verify that
+          the items we write are valid items and write the appropriate
+          fields.</li>
+        </ul>
+        <source>
+writer.append({"name": "Alyssa", "favorite_number": 256})
+writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
+        </source>
+        <p>
+          We use <code>DataFileWriter.append</code> to add items to our data
+          file.  Avro records are represented as Python <code>dict</code>s.
+          Since the field <code>favorite_color</code> has type <code>["int",
+          "null"]</code>, we are not required to specify this field, as shown in
+          the first append.  Were we to omit the required <code>name</code>
+          field, an exception would be raised.  Any extra entries not
+          corresponding to a field are present in the <code>dict</code> are
+          ignored.
+        </p>
+        <source>
+reader = DataFileReader(open("users.avro", "rb"), DatumReader())
+        </source>
+        <p>
+          We open the file again, this time for reading back from disk.  We use
+          a <code>DataFileReader</code> and <code>DatumReader</code> analagous
+          to the <code>DataFileWriter</code> and <code>DatumWriter</code> above.
+        </p>
+        <source>
+for user in reader:
+    print(user)
+        </source>
+        <p>
+          The <code>DataFileReader</code> is an iterator that returns
+          <code>dict</code>s corresponding to the serialized items.
+        </p>
+    </section>
+  </body>
+</document>
diff --git a/doc/src/content/xdocs/site.xml b/doc/src/content/xdocs/site.xml
index e77517a..6f86841 100644
--- a/doc/src/content/xdocs/site.xml
+++ b/doc/src/content/xdocs/site.xml
@@ -42,7 +42,8 @@ See https://forrest.apache.org/docs/linking.html for more info
   <docs label="Documentation">
     <overview   label="Overview"          href="index.html" />
     <gettingstartedjava label="Getting started (Java)" href="gettingstartedjava.html" />
-    <gettingstartedpython label="Getting started (Python)" href="gettingstartedpython.html" />
+    <gettingstartedpython label="Getting started (Python2)" href="gettingstartedpython.html" />
+    <gettingstartedpython label="Getting started (Python3)" href="gettingstartedpython3.html" />
     <spec       label="Specification"     href="spec.html" />
     <trevni     label="Trevni"            href="ext:trevni/spec" />
     <java-api   label="Java API"          href="ext:api/java/index" />