You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by uw...@apache.org on 2018/12/23 16:31:23 UTC
[16/51] [partial] arrow-site git commit: Upload nightly docs
http://git-wip-us.apache.org/repos/asf/arrow-site/blob/62ef7145/docs/latest/python/data.html
----------------------------------------------------------------------
diff --git a/docs/latest/python/data.html b/docs/latest/python/data.html
new file mode 100644
index 0000000..392d1e3
--- /dev/null
+++ b/docs/latest/python/data.html
@@ -0,0 +1,982 @@
+
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+ <meta charset="utf-8">
+
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+ <title>Data Types and In-Memory Data Model — Apache Arrow v0.11.1.dev473+g6ed02454</title>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+ <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+ <link rel="index" title="Index" href="../genindex.html" />
+ <link rel="search" title="Search" href="../search.html" />
+ <link rel="next" title="Streaming, Serialization, and IPC" href="ipc.html" />
+ <link rel="prev" title="Memory and IO Interfaces" href="memory.html" />
+
+
+ <script src="../_static/js/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav">
+
+
+ <div class="wy-grid-for-nav">
+
+
+ <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+ <div class="wy-side-scroll">
+ <div class="wy-side-nav-search">
+
+
+
+ <a href="../index.html" class="icon icon-home"> Apache Arrow
+
+
+
+ </a>
+
+
+
+
+ <div class="version">
+ 0.11.1.dev473+g6ed02454
+ </div>
+
+
+
+
+<div role="search">
+ <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+ <input type="text" name="q" placeholder="Search docs" />
+ <input type="hidden" name="check_keywords" value="yes" />
+ <input type="hidden" name="area" value="default" />
+ </form>
+</div>
+
+
+ </div>
+
+ <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+
+
+
+
+
+
+ <p class="caption"><span class="caption-text">Memory Format</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../format/README.html">Arrow specification documents</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/Guidelines.html">Implementation guidelines</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/Layout.html">Physical memory layout</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/Metadata.html">Metadata: Logical types, schemas, data headers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/IPC.html">Interprocess messaging / communication (IPC)</a></li>
+</ul>
+<p class="caption"><span class="caption-text">Languages</span></p>
+<ul class="current">
+<li class="toctree-l1"><a class="reference internal" href="../cpp/index.html">C++ Implementation</a></li>
+<li class="toctree-l1 current"><a class="reference internal" href="index.html">Python bindings</a><ul class="current">
+<li class="toctree-l2"><a class="reference internal" href="install.html">Installing PyArrow</a></li>
+<li class="toctree-l2"><a class="reference internal" href="memory.html">Memory and IO Interfaces</a></li>
+<li class="toctree-l2 current"><a class="current reference internal" href="#">Data Types and In-Memory Data Model</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="#type-metadata">Type Metadata</a></li>
+<li class="toctree-l3"><a class="reference internal" href="#schemas">Schemas</a></li>
+<li class="toctree-l3"><a class="reference internal" href="#arrays">Arrays</a><ul>
+<li class="toctree-l4"><a class="reference internal" href="#none-values-and-nan-handling">None values and NAN handling</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#list-arrays">List arrays</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#struct-arrays">Struct arrays</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#union-arrays">Union arrays</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#dictionary-arrays">Dictionary Arrays</a></li>
+</ul>
+</li>
+<li class="toctree-l3"><a class="reference internal" href="#record-batches">Record Batches</a></li>
+<li class="toctree-l3"><a class="reference internal" href="#tables">Tables</a></li>
+<li class="toctree-l3"><a class="reference internal" href="#custom-schema-and-field-metadata">Custom Schema and Field Metadata</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="ipc.html">Streaming, Serialization, and IPC</a></li>
+<li class="toctree-l2"><a class="reference internal" href="filesystems.html">File System Interfaces</a></li>
+<li class="toctree-l2"><a class="reference internal" href="plasma.html">The Plasma In-Memory Object Store</a></li>
+<li class="toctree-l2"><a class="reference internal" href="numpy.html">NumPy Integration</a></li>
+<li class="toctree-l2"><a class="reference internal" href="pandas.html">Pandas Integration</a></li>
+<li class="toctree-l2"><a class="reference internal" href="csv.html">Reading CSV files</a></li>
+<li class="toctree-l2"><a class="reference internal" href="parquet.html">Reading and Writing the Apache Parquet Format</a></li>
+<li class="toctree-l2"><a class="reference internal" href="extending.html">Using pyarrow from C++ and Cython Code</a></li>
+<li class="toctree-l2"><a class="reference internal" href="api.html">API Reference</a></li>
+<li class="toctree-l2"><a class="reference internal" href="development.html">Development</a></li>
+<li class="toctree-l2"><a class="reference internal" href="getting_involved.html">Getting Involved</a></li>
+</ul>
+</li>
+</ul>
+
+
+
+ </div>
+ </div>
+ </nav>
+
+ <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
+
+
+ <nav class="wy-nav-top" aria-label="top navigation">
+
+ <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+ <a href="../index.html">Apache Arrow</a>
+
+ </nav>
+
+
+ <div class="wy-nav-content">
+
+ <div class="rst-content">
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+
+ <ul class="wy-breadcrumbs">
+
+ <li><a href="../index.html">Docs</a> »</li>
+
+ <li><a href="index.html">Python bindings</a> »</li>
+
+ <li>Data Types and In-Memory Data Model</li>
+
+
+ <li class="wy-breadcrumbs-aside">
+
+
+ <a href="../_sources/python/data.rst.txt" rel="nofollow"> View page source</a>
+
+
+ </li>
+
+ </ul>
+
+
+ <hr/>
+</div>
+ <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+ <div itemprop="articleBody">
+
+ <div class="section" id="data-types-and-in-memory-data-model">
+<span id="data"></span><h1>Data Types and In-Memory Data Model<a class="headerlink" href="#data-types-and-in-memory-data-model" title="Permalink to this headline">¶</a></h1>
+<p>Apache Arrow defines columnar array data structures by composing type metadata
+with memory buffers, like the ones explained in the documentation on
+<a class="reference internal" href="memory.html#io"><span class="std std-ref">Memory and IO</span></a>. These data structures are exposed in Python through
+a series of interrelated classes:</p>
+<ul class="simple">
+<li><strong>Type Metadata</strong>: Instances of <code class="docutils literal notranslate"><span class="pre">pyarrow.DataType</span></code>, which describe a logical
+array type</li>
+<li><strong>Schemas</strong>: Instances of <code class="docutils literal notranslate"><span class="pre">pyarrow.Schema</span></code>, which describe a named
+collection of types. These can be thought of as the column types in a
+table-like object.</li>
+<li><strong>Arrays</strong>: Instances of <code class="docutils literal notranslate"><span class="pre">pyarrow.Array</span></code>, which are atomic, contiguous
+columnar data structures composed from Arrow Buffer objects</li>
+<li><strong>Record Batches</strong>: Instances of <code class="docutils literal notranslate"><span class="pre">pyarrow.RecordBatch</span></code>, which are a
+collection of Array objects with a particular Schema</li>
+<li><strong>Tables</strong>: Instances of <code class="docutils literal notranslate"><span class="pre">pyarrow.Table</span></code>, a logical table data structure in
+which each column consists of one or more <code class="docutils literal notranslate"><span class="pre">pyarrow.Array</span></code> objects of the
+same type.</li>
+</ul>
+<p>We will examine these in the sections below in a series of examples.</p>
+<div class="section" id="type-metadata">
+<span id="data-types"></span><h2>Type Metadata<a class="headerlink" href="#type-metadata" title="Permalink to this headline">¶</a></h2>
+<p>Apache Arrow defines language agnostic column-oriented data structures for
+array data. These include:</p>
+<ul class="simple">
+<li><strong>Fixed-length primitive types</strong>: numbers, booleans, date and times, fixed
+size binary, decimals, and other values that fit into a given number</li>
+<li><strong>Variable-length primitive types</strong>: binary, string</li>
+<li><strong>Nested types</strong>: list, struct, and union</li>
+<li><strong>Dictionary type</strong>: An encoded categorical type (more on this later)</li>
+</ul>
+<p>Each logical data type in Arrow has a corresponding factory function for
+creating an instance of that type object in Python:</p>
+<div class="highlight-ipython notranslate"><div class="highlight"><pre><span></span><span class="gp">In [1]: </span><span class="kn">import</span> <span class="nn">pyarrow</span> <span class="kn">as</span> <span class="nn">pa</span>
+
+<span class="gp">In [2]: </span><span class="n">t1</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">int32</span><span class="p">()</span>
+
+<span class="gp">In [3]: </span><span class="n">t2</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">string</span><span class="p">()</span>
+
+<span class="gp">In [4]: </span><span class="n">t3</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">binary</span><span class="p">()</span>
+
+<span class="gp">In [5]: </span><span class="n">t4</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">binary</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
+
+<span class="gp">In [6]: </span><span class="n">t5</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">timestamp</span><span class="p">(</span><span class="s1">'ms'</span><span class="p">)</span>
+
+<span class="gp">In [7]: </span><span class="n">t1</span>
+<span class="gh">Out[7]: </span><span class="go">DataType(int32)</span>
+
+<span class="gp">In [8]: </span><span class="k">print</span><span class="p">(</span><span class="n">t1</span><span class="p">)</span>
+<span class="go">