You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2017/05/08 04:53:10 UTC
[25/27] arrow-site git commit: Update Python documentation
http://git-wip-us.apache.org/repos/asf/arrow-site/blob/6360599f/docs/python/data.html
----------------------------------------------------------------------
diff --git a/docs/python/data.html b/docs/python/data.html
new file mode 100644
index 0000000..e16f145
--- /dev/null
+++ b/docs/python/data.html
@@ -0,0 +1,524 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+
+ <title>In-Memory Data Model — pyarrow documentation</title>
+
+ <link rel="stylesheet" href="_static/sphinxdoc.css" type="text/css" />
+ <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+
+ <script type="text/javascript">
+ var DOCUMENTATION_OPTIONS = {
+ URL_ROOT: './',
+ VERSION: '',
+ COLLAPSE_INDEX: false,
+ FILE_SUFFIX: '.html',
+ HAS_SOURCE: true,
+ SOURCELINK_SUFFIX: '.txt'
+ };
+ </script>
+ <script type="text/javascript" src="_static/jquery.js"></script>
+ <script type="text/javascript" src="_static/underscore.js"></script>
+ <script type="text/javascript" src="_static/doctools.js"></script>
+ <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
+ <link rel="index" title="Index" href="genindex.html" />
+ <link rel="search" title="Search" href="search.html" />
+ <link rel="next" title="IPC: Fast Streaming and Serialization" href="ipc.html" />
+ <link rel="prev" title="Memory and IO Interfaces" href="memory.html" />
+ </head>
+ <body role="document">
+ <div class="related" role="navigation" aria-label="related navigation">
+ <h3>Navigation</h3>
+ <ul>
+ <li class="right" style="margin-right: 10px">
+ <a href="genindex.html" title="General Index"
+ accesskey="I">index</a></li>
+ <li class="right" >
+ <a href="ipc.html" title="IPC: Fast Streaming and Serialization"
+ accesskey="N">next</a> |</li>
+ <li class="right" >
+ <a href="memory.html" title="Memory and IO Interfaces"
+ accesskey="P">previous</a> |</li>
+ <li class="nav-item nav-item-0"><a href="index.html">pyarrow documentation</a> »</li>
+ </ul>
+ </div>
+ <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
+ <div class="sphinxsidebarwrapper">
+ <h3><a href="index.html">Table Of Contents</a></h3>
+ <ul>
+<li><a class="reference internal" href="#">In-Memory Data Model</a><ul>
+<li><a class="reference internal" href="#type-metadata">Type Metadata</a></li>
+<li><a class="reference internal" href="#schemas">Schemas</a></li>
+<li><a class="reference internal" href="#arrays">Arrays</a><ul>
+<li><a class="reference internal" href="#dictionary-arrays">Dictionary Arrays</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#record-batches">Record Batches</a></li>
+<li><a class="reference internal" href="#tables">Tables</a></li>
+<li><a class="reference internal" href="#custom-schema-and-field-metadata">Custom Schema and Field Metadata</a></li>
+</ul>
+</li>
+</ul>
+
+ <h4>Previous topic</h4>
+ <p class="topless"><a href="memory.html"
+ title="previous chapter">Memory and IO Interfaces</a></p>
+ <h4>Next topic</h4>
+ <p class="topless"><a href="ipc.html"
+ title="next chapter">IPC: Fast Streaming and Serialization</a></p>
+ <div role="note" aria-label="source link">
+ <h3>This Page</h3>
+ <ul class="this-page-menu">
+ <li><a href="_sources/data.rst.txt"
+ rel="nofollow">Show Source</a></li>
+ </ul>
+ </div>
+<div id="searchbox" style="display: none" role="search">
+ <h3>Quick search</h3>
+ <form class="search" action="search.html" method="get">
+ <div><input type="text" name="q" /></div>
+ <div><input type="submit" value="Go" /></div>
+ <input type="hidden" name="check_keywords" value="yes" />
+ <input type="hidden" name="area" value="default" />
+ </form>
+</div>
+<script type="text/javascript">$('#searchbox').show(0);</script>
+ </div>
+ </div>
+
+ <div class="document">
+ <div class="documentwrapper">
+ <div class="bodywrapper">
+ <div class="body" role="main">
+
+ <div class="section" id="in-memory-data-model">
+<span id="data"></span><h1>In-Memory Data Model<a class="headerlink" href="#in-memory-data-model" title="Permalink to this headline">¶</a></h1>
+<p>Apache Arrow defines columnar array data structures by composing type metadata
+with memory buffers, like the ones explained in the documentation on
+<a class="reference internal" href="memory.html#io"><span class="std std-ref">Memory and IO</span></a>. These data structures are exposed in Python through
+a series of interrelated classes:</p>
+<ul class="simple">
+<li><strong>Type Metadata</strong>: Instances of <code class="docutils literal"><span class="pre">pyarrow.DataType</span></code>, which describe a logical
+array type</li>
+<li><strong>Schemas</strong>: Instances of <code class="docutils literal"><span class="pre">pyarrow.Schema</span></code>, which describe a named
+collection of types. These can be thought of as the column types in a
+table-like object.</li>
+<li><strong>Arrays</strong>: Instances of <code class="docutils literal"><span class="pre">pyarrow.Array</span></code>, which are atomic, contiguous
+columnar data structures composed from Arrow Buffer objects</li>
+<li><strong>Record Batches</strong>: Instances of <code class="docutils literal"><span class="pre">pyarrow.RecordBatch</span></code>, which are a
+collection of Array objects with a particular Schema</li>
+<li><strong>Tables</strong>: Instances of <code class="docutils literal"><span class="pre">pyarrow.Table</span></code>, a logical table data structure in
+which each column consists of one or more <code class="docutils literal"><span class="pre">pyarrow.Array</span></code> objects of the
+same type.</li>
+</ul>
+<p>We will examine these in the sections below in a series of examples.</p>
+<div class="section" id="type-metadata">
+<span id="data-types"></span><h2>Type Metadata<a class="headerlink" href="#type-metadata" title="Permalink to this headline">¶</a></h2>
+<p>Apache Arrow defines language agnostic column-oriented data structures for
+array data. These include:</p>
+<ul class="simple">
+<li><strong>Fixed-length primitive types</strong>: numbers, booleans, date and times, fixed
+size binary, decimals, and other values that fit into a given number</li>
+<li><strong>Variable-length primitive types</strong>: binary, string</li>
+<li><strong>Nested types</strong>: list, struct, and union</li>
+<li><strong>Dictionary type</strong>: An encoded categorical type (more on this later)</li>
+</ul>
+<p>Each logical data type in Arrow has a corresponding factory function for
+creating an instance of that type object in Python:</p>
+<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [1]: </span><span class="kn">import</span> <span class="nn">pyarrow</span> <span class="kn">as</span> <span class="nn">pa</span>
+
+<span class="gp">In [2]: </span><span class="n">t1</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">int32</span><span class="p">()</span>
+
+<span class="gp">In [3]: </span><span class="n">t2</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">string</span><span class="p">()</span>
+
+<span class="gp">In [4]: </span><span class="n">t3</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">binary</span><span class="p">()</span>
+
+<span class="gp">In [5]: </span><span class="n">t4</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">binary</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
+
+<span class="gp">In [6]: </span><span class="n">t5</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">timestamp</span><span class="p">(</span><span class="s1">'ms'</span><span class="p">)</span>
+
+<span class="gp">In [7]: </span><span class="n">t1</span>
+<span class="gh">Out[7]: </span><span class="go">DataType(int32)</span>
+
+<span class="gp">In [8]: </span><span class="k">print</span><span class="p">(</span><span class="n">t1</span><span class="p">)</span>
+<span class="go">